查詢將每列值分為兩類:C呸呸呸呸_____這噸C嗬嗬___C這米米這n,ñ這噸C這米米這nCommon , Not_Common
我有一個具有以下結構和數據的表:
create table PTEST ( col_name VARCHAR(50), col_value VARCHAR(50) ) COL_NAME COL_VALUE ----------------------- first apple first banana second apple second banana second orange third apple third banana
**) 我想要做的是將
col_value
列中的每個值分為兩類:$$ Common,Not common $$ **)
'Common'
如果一個值出現在每個中,則考慮一個值col_name
,因此apple
很常見,因為它出現在 中col_name = first and col_name = second and col_name = third
。對於banana
.Orange
並不常見,因為它只是出現在col_name = second
.所需的輸出將是這樣的:
COL_NAME COL_VALUE STATUS --------------------------------- first apple Common first banana Common second banana Common second apple Common second orange Not common third apple Common third banana Common
我為此寫的查詢是:
select col_name, col_value, case when count_col = count_val then 'Common' else 'Not common' end STATUS from (select t.col_name, count(distinct t.col_name) over() count_col, t.col_value, count(t.col_value) over(partition by t.col_value) count_val from PTEST t)
我想知道是否有更好的方法來做到這一點。
提前致謝
執行此操作的兩種方法如下(下面的所有程式碼都可以在SQL Server 的小提琴中找到- 有計劃 -在這裡- 最後的性能分析:
桌子:
CREATE TABLE ptest ( col_name VARCHAR (50) NOT NULL, col_value VARCHAR (50) NOT NULL, CONSTRAINT name_value_uq UNIQUE (col_name, col_value) );
填充它:
INSERT INTO ptest VALUES ('first', 'apple'), ('first', 'banana'), ('second', 'apple'), ('second', 'banana'), ('third', 'apple'), ('third', 'banana'), ('third', 'orange');
第一種方式:
首先,我們想知道一個水果在整個表格中出現了多少次。
SELECT col_name, col_value, COUNT(col_value) OVER (PARTITION BY col_value) AS cnt FROM ptest;
結果:
col_name col_value cnt first apple 3 first banana 3 second apple 3 second banana 3 third apple 3 third banana 3 third orange 1 7 rows
您有多種方法可以找到水果出現的次數少於您定義的 cnt (3) 的最大值
common
- 所以我們一眼就能看出orange
是uncommon
。所以,我正在使用 CTE 來做到這一點:
WITH cte1 AS ( SELECT col_name, col_value, COUNT(col_value) OVER (PARTITION BY col_value) AS cnt -- COUNT(col_value) OVER (PARTITION BY col_name ORDER BY col_value) FROM ptest ), cte2 AS ( SELECT MAX (cnt) AS mcnt FROM cte1 ) SELECT * FROM cte1 WHERE cnt < (SELECT mcnt FROM cte2);
結果:
col_name col_value cnt third orange 1
你去吧!
為了更接近您自己的原始(不工作 - 見小提琴)查詢,您可以這樣做(再次,在小提琴中):
WITH cte1 AS ( SELECT col_name, col_value, COUNT(col_value) OVER (PARTITION BY col_value) AS cnt -- COUNT(col_value) OVER (PARTITION BY col_name ORDER BY col_value) FROM ptest ), cte2 AS ( SELECT MAX (cnt) AS mcnt FROM cte1 ) SELECT col_name, col_value, CASE WHEN cnt < (SELECT mcnt FROM cte2) THEN 'Uncommon' ELSE 'Common' END AS status FROM cte1;
結果相同。
第二種方式:
如果您正在執行沒有視窗功能的古董(或最新版本的 MySQL :-)),您也可以這樣做:
SELECT * FROM ( SELECT col_value, COUNT(col_value) AS cnt FROM ptest GROUP BY col_value ) AS t WHERE cnt < ( SELECT MAX(cnt) FROM ( SELECT col_value, COUNT(col_value) AS cnt FROM ptest GROUP BY col_value ) AS u );
結果:
col_value cnt orange 1
又來了!!
你在問題中問:
I was wondering if there are better ways to do that.
因此,我在小提琴的底部添加了以下幾行(在此處記錄):
SET STATISTICS PROFILE ON; SET STATISTICS TIME ON; SET STATISTICS IO ON;
最後
SET SHOWPLAN_ALL ON;
從 db<>fiddle 獲得非常細粒度的時序似乎是不可能的,但計劃很有趣。
視窗函式查詢產生以下計劃(23 行):
|--Nested Loops(Inner Join, WHERE:([Expr1003]<[Expr1008])) 1 2 1 Nested Loops Inner Join WHERE:([Expr1003]<[Expr1008]) 7 0 2.926E-05 47 0.02971825 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_name], [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value], [Expr1003] PLAN_ROW False 1 |--Stream Aggregate(DEFINE:([Expr1008]=MAX([Expr1007]))) 1 3 2 Stream Aggregate Aggregate [Expr1008]=MAX([Expr1007]) 1 0 4.7E-06 11 0.01484579 [Expr1008] PLAN_ROW False 1 | |--Nested Loops(Inner Join) 1 4 3 Nested Loops Inner Join 7 0 0.0001227688 11 0.01484109 [Expr1007] PLAN_ROW False 1 | |--Table Spool 1 5 4 Table Spool Lazy Spool 3 0 0 36 0.01471354 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] PLAN_ROW False 1 | | |--Segment 1 6 5 Segment Segment [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] 7 0 1.5944E-05 36 0.0146976 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value], [Segment1011] PLAN_ROW False 1 | | |--Sort(ORDER BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] ASC)) 1 7 6 Sort Sort ORDER BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] ASC) 7 0.01126126 0.0001306923 36 0.01468165 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] PLAN_ROW False 1 | | |--Index Scan(OBJECT:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[name_value_uq])) 1 8 7 Index Scan Index Scan OBJECT:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[name_value_uq]) [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] 7 0.003125 0.0001647 36 0.0032897 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] PLAN_ROW False 1 | |--Nested Loops(Inner Join, WHERE:((1))) 1 9 4 Nested Loops Inner Join WHERE:((1)) 2.333333 0 1.5944E-06 36 3.1888E-06 [Expr1007] PLAN_ROW False 4 | |--Compute Scalar(DEFINE:([Expr1007]=CONVERT_IMPLICIT(int,[Expr1012],0))) 1 10 9 Compute Scalar Compute Scalar DEFINE:([Expr1007]=CONVERT_IMPLICIT(int,[Expr1012],0)) [Expr1007]=CONVERT_IMPLICIT(int,[Expr1012],0) 1 0 1.5944E-07 36 1.75384E-06 [Expr1007], [Expr1007] PLAN_ROW False 4 | | |--Stream Aggregate(DEFINE:([Expr1012]=Count(*))) 1 11 10 Stream Aggregate Aggregate [Expr1012]=Count(*) 1 0 1.5944E-06 36 1.5944E-06 [Expr1012] PLAN_ROW False 4 | | |--Table Spool 1 12 11 Table Spool Lazy Spool 2.333333 0 0 36 0 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] PLAN_ROW False 4 | |--Table Spool 1 13 9 Table Spool Lazy Spool 2.333333 0 0 36 0 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] PLAN_ROW False 4 |--Nested Loops(Inner Join) 1 14 2 Nested Loops Inner Join 7 0 0.0001227688 47 0.0148411 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_name], [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value], [Expr1003] PLAN_ROW False 1 |--Table Spool 1 15 14 Table Spool Lazy Spool 3 0 0 43 0.01471355 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_name], [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] PLAN_ROW False 1 | |--Segment 1 16 15 Segment Segment [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] 7 0 1.5944E-05 43 0.0146976 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_name], [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value], [Segment1013] PLAN_ROW False 1 | |--Sort(ORDER BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] ASC)) 1 17 16 Sort Sort ORDER BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] ASC) 7 0.01126126 0.0001306993 43 0.01468166 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_name], [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] PLAN_ROW False 1 | |--Index Scan(OBJECT:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[name_value_uq])) 1 18 17 Index Scan Index Scan OBJECT:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[name_value_uq]) [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_name], [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] 7 0.003125 0.0001647 43 0.0032897 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_name], [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] PLAN_ROW False 1 |--Nested Loops(Inner Join, WHERE:((1))) 1 19 14 Nested Loops Inner Join WHERE:((1)) 2.333333 0 1.5944E-06 43 3.1888E-06 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_name], [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value], [Expr1003] PLAN_ROW False 4 |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1014],0))) 1 20 19 Compute Scalar Compute Scalar DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1014],0)) [Expr1003]=CONVERT_IMPLICIT(int,[Expr1014],0) 1 0 1.5944E-07 43 1.75384E-06 [Expr1003], [Expr1003] PLAN_ROW False 4 | |--Stream Aggregate(DEFINE:([Expr1014]=Count(*))) 1 21 20 Stream Aggregate Aggregate [Expr1014]=Count(*) 1 0 1.5944E-06 43 1.5944E-06 [Expr1014] PLAN_ROW False 4 | |--Table Spool 1 22 21 Table Spool Lazy Spool 2.333333 0 0 43 0 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_name], [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] PLAN_ROW False 4 |--Table Spool 1 23 19 Table Spool Lazy Spool 2.333333 0 0 43 0 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_name], [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] PLAN_ROW False 4 23 rows SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.
而“老式”的則產生了這個 11 行的計劃:
|--Nested Loops(Inner Join, WHERE:([Expr1003]<[Expr1008])) 1 2 1 Nested Loops Inner Join WHERE:([Expr1003]<[Expr1008]) 3 0 1.254E-05 20 0.02939043 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value], [Expr1003] PLAN_ROW False 1 |--Stream Aggregate(DEFINE:([Expr1008]=MAX([Expr1007]))) 1 3 2 Stream Aggregate Aggregate [Expr1008]=MAX([Expr1007]) 1 0 2.3E-06 11 0.01468965 [Expr1008] PLAN_ROW False 1 | |--Compute Scalar(DEFINE:([Expr1007]=CONVERT_IMPLICIT(int,[Expr1015],0))) 1 4 3 Compute Scalar Compute Scalar DEFINE:([Expr1007]=CONVERT_IMPLICIT(int,[Expr1015],0)) [Expr1007]=CONVERT_IMPLICIT(int,[Expr1015],0) 3 0 0 11 0.01468735 [Expr1007] PLAN_ROW False 1 | |--Stream Aggregate(GROUP BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]) DEFINE:([Expr1015]=Count(*))) 1 5 4 Stream Aggregate Aggregate GROUP BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]) [Expr1015]=Count(*) 3 0 5.7E-06 11 0.01468735 [Expr1015] PLAN_ROW False 1 | |--Sort(ORDER BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] ASC)) 1 6 5 Sort Sort ORDER BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] ASC) 7 0.01126126 0.0001306923 36 0.01468165 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] PLAN_ROW False 1 | |--Index Scan(OBJECT:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[name_value_uq])) 1 7 6 Index Scan Index Scan OBJECT:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[name_value_uq]) [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] 7 0.003125 0.0001647 36 0.0032897 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] PLAN_ROW False 1 |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1016],0))) 1 8 2 Compute Scalar Compute Scalar DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1016],0)) [Expr1003]=CONVERT_IMPLICIT(int,[Expr1016],0) 3 0 0 20 0.01468733 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value], [Expr1003] PLAN_ROW False 1 |--Stream Aggregate(GROUP BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]) DEFINE:([Expr1016]=Count(*))) 1 9 8 Stream Aggregate Aggregate GROUP BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value]) [Expr1016]=Count(*) 3 0 5.7E-06 20 0.01468733 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value], [Expr1016] PLAN_ROW False 1 |--Sort(ORDER BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] ASC)) 1 10 9 Sort Sort ORDER BY:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] ASC) 7 0.01126126 0.0001306723 16 0.01468163 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] PLAN_ROW False 1 |--Index Scan(OBJECT:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[name_value_uq])) 1 11 10 Index Scan Index Scan OBJECT:([fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[name_value_uq]) [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] 7 0.003125 0.0001647 16 0.0032897 [fiddle_404696dc8d6846e389dd3d04d0ec3512].[dbo].[ptest].[col_value] PLAN_ROW False 1 11 rows SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.
鑑於我們缺乏明確的時間安排 - 無論如何,用如此少量的數據進行測試或多或少毫無意義,我會敦促您針對您自己的表和硬體測試任何和所有建議的解決方案……但是,作為一項規則拇指,計劃越長,它們越慢,並且視窗函式會產生成本!從這裡:
如您所見,與傳統方法相比,視窗聚合對性能有很大影響。
將來,在問這種性質的問題時,請您自己提供小提琴-它提供了單一的事實來源並消除了重複勞動-幫助我們為您提供幫助!:-)