Oracle
是什麼減慢了這個直方圖查詢?
我有一個表
my_link_t
,其中列t.weight
類型number
和分類speedcat
類型整數值從 2 到 8。我想從 min=0、max=0.6、step = 0.001 拆分桶中的數據並建構一個 3D 圖以查看每個類別的權重分佈。初始數據看起來像
weight speedcat
0.0234 2
0.8643 6 0.1854
7
(重量在 0 到 0.6 之間以及 speedcat 在 2 到 8 之間的數億個條目)
這些查詢返回正確的結果並在不到一分鐘的時間內完成:
--repeat for each variable. Here we loook for speedcat =8 --It takes seconds to run this query create table histogram_tbl_8 as ( select ttt."Start" as bucket_index, ttt.hist_row as bin8 --here FROM (( SELECT Bucket*1 "Start" , Bucket "End", Count(Bucket) hist_row FROM (SELECT WIDTH_BUCKET (weight, 0, 0.6, 601) Bucket FROM my_link_t where speedcat=8) GROUP BY Bucket ORDER BY Bucket ) ttt ) );
speedcat
上述查詢在範圍內 重複了七次2..8
--if a bin is empty populate it with zero, don't skip it. create table histogram_output as ( select tr.bucket_index, CASE WHEN 1 > (select count(*) from histogram_tbl_2 htm where htm.bucket_index = tr.bucket_index) THEN 0 ELSE (select htm.bin2 from histogram_tbl_2 htm where htm.bucket_index = tr.bucket_index and rownum = 1) END as b2, --same for b3-b7 CASE WHEN 1 > (select count(*) from histogram_tbl_8 htm where htm.bucket_index = tr.bucket_index) THEN 0 ELSE (select htm.bin8 from histogram_tbl_8 htm where htm.bucket_index = tr.bucket_index and rownum = 1) END as b8 FROM (SELECT LEVEL as bucket_index, 0 as b2, /* 0 as b3, 0 as b4, 0 as b5, 0 as b6, 0 as b7, */ 0 as b8 FROM DUAL CONNECT BY LEVEL < 600) tr )
最後
select sum(b2), sum(b3),sum(b4),sum(b5),sum(b6),sum(b7),sum(b8) from histogram_output select bucket_index, round(b2 * 1000000 / 12921) as b2, --normalize so that total is 1000000 ppm -- repeat for b3-b7 round(b8 * 1000000 / 6262) as b8 --normalize so that total is 1000000 ppm from histogram_output
我得到一張像
bin_end speedcat_2 speedcat_3 speedcat_4 .. speedcat_8
0.001
0.002 .. 0.599 0.600
顯示此類別和此 bin 中對象的 ppm 現在,當我在
-- DONT USE THE EXAMPLE BELOW - it is ineefficient (runs 2+ hours instead of seconds for the method above) SELECT Bucket_2*1 "Start" , Bucket_2 "End", Count(Bucket_2) as b2, --same for b3 .. b7 Count(Bucket_8) as b8 FROM ( SELECT WIDTH_BUCKET (t2.weight, 0, 0.6, 601) Bucket_2, --same for t3,.. t7 WIDTH_BUCKET (t8.weight, 0, 0.6, 601) Bucket_8 FROM (select weight from my_link_t where speedcat = 2) t2, -- ..speedcat = 3) t3, .. speedcat = 4) t4, etc (select weight from my_link_t where speedcat = 8 ) t8 ) GROUP BY Bucket_2 ORDER BY Bucket_2 ------
查詢執行幾個小時(執行時間比單個查詢長約 500 倍),直到我殺死它。書籍建議在 SQL 中進行所有數據切片。這個例子表明,在復雜查詢的情況下,將數據載入到 Java 並在那里切片可能會更好。
什麼會導致差異?
簡答
你的 7-way Cartesian
JOIN
將會有一些嚴重的性能問題。長答案
在集合中思考。
我假設您的數據集需要包含:speedcat、bucket_index、count(*)
該數據集的簡單解決方案很簡單:
select t.speedcat , WIDTH_BUCKET (t.weight, 0, 0.6, 601) as bucket_index , count(*) N from my_link_t t group by t.speedcat, WIDTH_BUCKET (t.weight, 0, 0.6, 601)
這是格式 (x,y,z) 是大多數圖形包的預期格式。
如果您想要“網格”格式的結果,那麼
PIVOT
結果。with data as ( -- same SELECT as previous answer select t.speedcat , WIDTH_BUCKET (t.weight, 0, 0.6, 601) as bucket_index , count(*) N from my_link_t t group by t.speedcat, WIDTH_BUCKET (t.weight, 0, 0.6, 601) ) select * from data pivot ( sum(N) for speedcat in ( 2 as "speedcat_2" ,3 as "speedcat_3" ,4 as "speedcat_4" ,5 as "speedcat_5" ,6 as "speedcat_6" ,7 as "speedcat_7" ,8 as "speedcat_8" ) )