Sql-Server

查詢將每列值分為兩類:C呸呸呸呸_____這噸C嗬嗬___C這米米這n,ñ這噸C這米米這nCommon , Not_Common

  • August 17, 2021

我有一個具有以下結構和數據的表:

create table PTEST
(
 col_name  VARCHAR(50),
 col_value VARCHAR(50)
)

   COL_NAME    COL_VALUE
  -----------------------
    first       apple
    first       banana
    second      apple
    second      banana
    second      orange
    third       apple
    third       banana

**) 我想要做的是將col_value列中的每個值分為兩類:

$$ Common,Not common $$ **)'Common'如果一個值出現在每個中,則考慮一個值col_name,因此apple很常見,因為它出現在 中col_name = first and col_name = second and col_name = third。對於banana. Orange並不常見,因為它只是出現在col_name = second.

所需的輸出將是這樣的:

   COL_NAME   COL_VALUE   STATUS
  ---------------------------------
   first       apple       Common
   first       banana      Common
   second      banana      Common
   second      apple       Common
   second      orange      Not common
   third       apple       Common
   third       banana      Common

我為此寫的查詢是:

select col_name,
      col_value,
      case
        when count_col = count_val then
         'Common'
        else
         'Not common'
      end STATUS
 from (select t.col_name,
              count(distinct t.col_name) over() count_col,
              t.col_value,
              count(t.col_value) over(partition by t.col_value) count_val
         from PTEST t)

我想知道是否有更好的方法來做到這一點。

提前致謝

執行此操作的兩種方法如下(下面的所有程式碼都可以在SQL Server 的小提琴中找到- 有計劃 -在這裡- 最後的性能分析:

桌子:

CREATE TABLE ptest
(
 col_name  VARCHAR (50) NOT NULL,
 col_value VARCHAR (50) NOT NULL,
 
 CONSTRAINT name_value_uq UNIQUE (col_name, col_value)
);

填充它:

INSERT INTO ptest VALUES
('first',  'apple'),
('first',  'banana'),
('second', 'apple'),
('second', 'banana'),
('third',  'apple'),
('third',  'banana'),
('third',  'orange');

第一種方式:

首先,我們想知道一個水果在整個表格中出現了多少次。

SELECT 
 col_name,
 col_value,
 COUNT(col_value) OVER (PARTITION BY col_value) AS cnt
FROM
 ptest;

結果:

col_name    col_value   cnt 
  first        apple     3 
  first       banana     3 
 second        apple     3 
 second       banana     3 
  third        apple     3 
  third       banana     3 
  third       orange     1 
7 rows

您有多種方法可以找到水果出現的次數少於您定義的 cnt (3) 的最大值common- 所以我們一眼就能看出orangeuncommon

所以,我正在使用 CTE 來做到這一點:

WITH cte1 AS
(
 SELECT 
   col_name,
   col_value,
   COUNT(col_value) OVER (PARTITION BY col_value) AS cnt
   -- COUNT(col_value) OVER (PARTITION BY col_name ORDER BY col_value)
 FROM
   ptest
),
cte2 AS
(
 SELECT MAX (cnt) AS mcnt FROM cte1
)
SELECT * FROM cte1 WHERE cnt < (SELECT mcnt FROM cte2);

結果:

col_name    col_value   cnt
  third       orange     1

你去吧!

為了更接近您自己的原始(不工作 - 見小提琴)查詢,您可以這樣做(再次,在小提琴中):

WITH cte1 AS
(
 SELECT 
   col_name,
   col_value,
   COUNT(col_value) OVER (PARTITION BY col_value) AS cnt
   -- COUNT(col_value) OVER (PARTITION BY col_name ORDER BY col_value)
 FROM
   ptest
),
cte2 AS
(
 SELECT MAX (cnt) AS mcnt FROM cte1
)
SELECT 
 col_name,
 col_value,
 CASE
   WHEN cnt < (SELECT mcnt FROM cte2) THEN 'Uncommon'
   ELSE 'Common'
 END AS status
FROM cte1;

結果相同。

第二種方式:

如果您正在執行沒有視窗功能的古董(或最新版本的 MySQL :-)),您也可以這樣做:

SELECT * FROM
(
 SELECT
   col_value, COUNT(col_value) AS cnt
 FROM 
   ptest
 GROUP BY col_value
) AS t
WHERE cnt < 
(
 SELECT MAX(cnt) FROM 
 (
   SELECT
     col_value, COUNT(col_value) AS cnt
   FROM 
     ptest
   GROUP BY col_value
 ) AS u
);

結果:

col_value   cnt
  orange     1

又來了!!

你在問題中問:

I was wondering if there are better ways to do that.

因此,我在小提琴的底部添加了以下幾行(在此處記錄):

SET STATISTICS PROFILE ON;  
SET STATISTICS TIME ON;
SET STATISTICS IO ON;

最後

SET SHOWPLAN_ALL ON;

從 db<>fiddle 獲得非常細粒度的時序似乎是不可能的,但計劃很有趣。

視窗函式查詢產生以下計劃(23 行):

|--Nested Loops(Inner Join, WHERE:([Expr1003]&lt;[Expr1008]))  1   2   1   Nested Loops    Inner Join  WHERE:([Expr1003]&lt;[Expr1008])       7   0   2.926E-05   47  0.02971825  [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_name], [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value], [Expr1003]     PLAN_ROW    False   1
          |--Stream Aggregate(DEFINE:([Expr1008]=MAX([Expr1007]))) 1   3   2   Stream Aggregate    Aggregate       [Expr1008]=MAX([Expr1007])  1   0   4.7E-06 11  0.01484579  [Expr1008]      PLAN_ROW    False   1
          |    |--Nested Loops(Inner Join) 1   4   3   Nested Loops    Inner Join          7   0   0.0001227688    11  0.01484109  [Expr1007]      PLAN_ROW    False   1
          |         |--Table Spool 1   5   4   Table Spool Lazy Spool          3   0   0   36  0.01471354  [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]     PLAN_ROW    False   1
          |         |    |--Segment    1   6   5   Segment Segment [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]     7   0   1.5944E-05  36  0.0146976   [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value], [Segment1011]      PLAN_ROW    False   1
          |         |         |--Sort(ORDER BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] ASC))  1   7   6   Sort    Sort    ORDER BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] ASC)      7   0.01126126  0.0001306923    36  0.01468165  [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]     PLAN_ROW    False   1
          |         |              |--Index Scan(OBJECT:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[name_value_uq])) 1   8   7   Index Scan  Index Scan  OBJECT:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[name_value_uq])    [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] 7   0.003125    0.0001647   36  0.0032897   [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]     PLAN_ROW    False   1
          |         |--Nested Loops(Inner Join, WHERE:((1)))   1   9   4   Nested Loops    Inner Join  WHERE:((1))     2.333333    0   1.5944E-06  36  3.1888E-06  [Expr1007]      PLAN_ROW    False   4
          |              |--Compute Scalar(DEFINE:([Expr1007]=CONVERT_IMPLICIT(int,[Expr1012],0))) 1   10  9   Compute Scalar  Compute Scalar  DEFINE:([Expr1007]=CONVERT_IMPLICIT(int,[Expr1012],0))  [Expr1007]=CONVERT_IMPLICIT(int,[Expr1012],0)   1   0   1.5944E-07  36  1.75384E-06 [Expr1007], [Expr1007]      PLAN_ROW    False   4
          |              |    |--Stream Aggregate(DEFINE:([Expr1012]=Count(*)))    1   11  10  Stream Aggregate    Aggregate       [Expr1012]=Count(*) 1   0   1.5944E-06  36  1.5944E-06  [Expr1012]      PLAN_ROW    False   4
          |              |         |--Table Spool  1   12  11  Table Spool Lazy Spool          2.333333    0   0   36  0   [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]     PLAN_ROW    False   4
          |              |--Table Spool    1   13  9   Table Spool Lazy Spool          2.333333    0   0   36  0   [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]     PLAN_ROW    False   4
          |--Nested Loops(Inner Join)  1   14  2   Nested Loops    Inner Join          7   0   0.0001227688    47  0.0148411   [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_name], [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value], [Expr1003]     PLAN_ROW    False   1
               |--Table Spool  1   15  14  Table Spool Lazy Spool          3   0   0   43  0.01471355  [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_name], [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]     PLAN_ROW    False   1
               |    |--Segment 1   16  15  Segment Segment [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]     7   0   1.5944E-05  43  0.0146976   [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_name], [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value], [Segment1013]      PLAN_ROW    False   1
               |         |--Sort(ORDER BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] ASC))   1   17  16  Sort    Sort    ORDER BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] ASC)      7   0.01126126  0.0001306993    43  0.01468166  [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_name], [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]     PLAN_ROW    False   1
               |              |--Index Scan(OBJECT:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[name_value_uq]))  1   18  17  Index Scan  Index Scan  OBJECT:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[name_value_uq])    [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_name], [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] 7   0.003125    0.0001647   43  0.0032897   [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_name], [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]     PLAN_ROW    False   1
               |--Nested Loops(Inner Join, WHERE:((1)))    1   19  14  Nested Loops    Inner Join  WHERE:((1))     2.333333    0   1.5944E-06  43  3.1888E-06  [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_name], [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value], [Expr1003]     PLAN_ROW    False   4
                    |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1014],0)))  1   20  19  Compute Scalar  Compute Scalar  DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1014],0))  [Expr1003]=CONVERT_IMPLICIT(int,[Expr1014],0)   1   0   1.5944E-07  43  1.75384E-06 [Expr1003], [Expr1003]      PLAN_ROW    False   4
                    |    |--Stream Aggregate(DEFINE:([Expr1014]=Count(*))) 1   21  20  Stream Aggregate    Aggregate       [Expr1014]=Count(*) 1   0   1.5944E-06  43  1.5944E-06  [Expr1014]      PLAN_ROW    False   4
                    |         |--Table Spool   1   22  21  Table Spool Lazy Spool          2.333333    0   0   43  0   [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_name], [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]     PLAN_ROW    False   4
                    |--Table Spool 1   23  19  Table Spool Lazy Spool          2.333333    0   0   43  0   [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_name], [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]     PLAN_ROW    False   4
   23 rows
   SQL Server parse and compile time: 
      CPU time = 0 ms, elapsed time = 0 ms.

而“老式”的則產生了這個 11 行的計劃:

|--Nested Loops(Inner Join, WHERE:([Expr1003]&lt;[Expr1008]))  1   2   1   Nested Loops    Inner Join  WHERE:([Expr1003]&lt;[Expr1008])       3   0   1.254E-05   20  0.02939043  [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value], [Expr1003]     PLAN_ROW    False   1
      |--Stream Aggregate(DEFINE:([Expr1008]=MAX([Expr1007]))) 1   3   2   Stream Aggregate    Aggregate       [Expr1008]=MAX([Expr1007])  1   0   2.3E-06 11  0.01468965  [Expr1008]      PLAN_ROW    False   1
      |    |--Compute Scalar(DEFINE:([Expr1007]=CONVERT_IMPLICIT(int,[Expr1015],0)))   1   4   3   Compute Scalar  Compute Scalar  DEFINE:([Expr1007]=CONVERT_IMPLICIT(int,[Expr1015],0))  [Expr1007]=CONVERT_IMPLICIT(int,[Expr1015],0)   3   0   0   11  0.01468735  [Expr1007]      PLAN_ROW    False   1
      |         |--Stream Aggregate(GROUP BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]) DEFINE:([Expr1015]=Count(*)))   1   5   4   Stream Aggregate    Aggregate   GROUP BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value])  [Expr1015]=Count(*) 3   0   5.7E-06 11  0.01468735  [Expr1015]      PLAN_ROW    False   1
      |              |--Sort(ORDER BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] ASC))   1   6   5   Sort    Sort    ORDER BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] ASC)      7   0.01126126  0.0001306923    36  0.01468165  [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]     PLAN_ROW    False   1
      |                   |--Index Scan(OBJECT:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[name_value_uq]))  1   7   6   Index Scan  Index Scan  OBJECT:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[name_value_uq])    [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] 7   0.003125    0.0001647   36  0.0032897   [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]     PLAN_ROW    False   1
      |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1016],0)))    1   8   2   Compute Scalar  Compute Scalar  DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1016],0))  [Expr1003]=CONVERT_IMPLICIT(int,[Expr1016],0)   3   0   0   20  0.01468733  [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value], [Expr1003]     PLAN_ROW    False   1
           |--Stream Aggregate(GROUP BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]) DEFINE:([Expr1016]=Count(*)))    1   9   8   Stream Aggregate    Aggregate   GROUP BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value])  [Expr1016]=Count(*) 3   0   5.7E-06 20  0.01468733  [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value], [Expr1016]     PLAN_ROW    False   1
                |--Sort(ORDER BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] ASC))    1   10  9   Sort    Sort    ORDER BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] ASC)      7   0.01126126  0.0001306723    16  0.01468163  [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]     PLAN_ROW    False   1
                     |--Index Scan(OBJECT:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[name_value_uq]))   1   11  10  Index Scan  Index Scan  OBJECT:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[name_value_uq])    [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] 7   0.003125    0.0001647   16  0.0032897   [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]     PLAN_ROW    False   1
11 rows
SQL Server parse and compile time: 
  CPU time = 0 ms, elapsed time = 0 ms.

鑑於我們缺乏明確的時間安排 - 無論如何,用如此少量的數據進行測試或多或少毫無意義,我會敦促您針對您自己的表和硬體測試任何和所有建議的解決方案……但是,作為一項規則拇指,計劃越長,它們越慢,並且視窗函式會產生成本!從這裡

如您所見,與傳統方法相比,視窗聚合對性能有很大影響。

將來,在問這種性質的問題時,請您自己提供小提琴-它提供了單一的事實來源並消除了重複勞動-幫助我們為您提供幫助!:-)

引用自:https://dba.stackexchange.com/questions/298104