Oracle

是什麼減慢了這個直方圖查詢?

  • November 27, 2018

我有一個表my_link_t,其中列t.weight類型number和分類speedcat類型整數值從 2 到 8。我想從 min=0、max=0.6、step = 0.001 拆分桶中的數據並建構一個 3D 圖以查看每個類別的權重分佈。

初始數據看起來像

weight speedcat

0.0234 2

0.8643 6 0.1854

7

(重量在 0 到 0.6 之間以及 speedcat 在 2 到 8 之間的數億個條目)

這些查詢返回正確的結果並在不到一分鐘的時間內完成:

--repeat for each variable. Here we loook for speedcat =8
--It takes seconds to run this query
create table   histogram_tbl_8  as (
 select  ttt."Start" as bucket_index, ttt.hist_row as bin8  --here
 FROM ((
 SELECT  Bucket*1 "Start" , Bucket "End", Count(Bucket) hist_row
 FROM (SELECT WIDTH_BUCKET (weight, 0, 0.6, 601) Bucket FROM my_link_t  where speedcat=8)
     GROUP BY Bucket ORDER BY Bucket ) ttt   )   );

speedcat上述查詢在範圍內 重複了七次2..8

--if a bin is empty populate it with zero, don't skip it.
create table histogram_output as (
select  tr.bucket_index,
   CASE
       WHEN 1 > (select count(*) from histogram_tbl_2   htm where htm.bucket_index = tr.bucket_index)   THEN 0
       ELSE (select htm.bin2 from histogram_tbl_2   htm where htm.bucket_index =  tr.bucket_index and rownum = 1)
    END
    as b2,
    --same for b3-b7
     CASE
       WHEN 1 > (select count(*) from histogram_tbl_8   htm where htm.bucket_index = tr.bucket_index)   THEN 0
       ELSE     (select htm.bin8    from histogram_tbl_8   htm where htm.bucket_index =  tr.bucket_index and rownum = 1)
    END
    as b8
    FROM  (SELECT LEVEL as bucket_index, 0 as b2, /* 0 as b3, 0 as b4, 0 as b5, 0 as b6, 0 as b7, */ 0 as b8  FROM DUAL CONNECT BY LEVEL < 600)  tr
    )

最後

     select sum(b2), sum(b3),sum(b4),sum(b5),sum(b6),sum(b7),sum(b8) from histogram_output
      select bucket_index,
             round(b2 * 1000000 / 12921) as  b2, --normalize so that total is 1000000 ppm
             -- repeat for b3-b7
             round(b8 * 1000000 / 6262) as  b8 --normalize so that total is 1000000 ppm
        from histogram_output

我得到一張像

bin_end speedcat_2 speedcat_3 speedcat_4 .. speedcat_8

0.001

0.002 .. 0.599 0.600

顯示此類別和此 bin 中對象的 ppm 現在,當我在

-- DONT USE THE EXAMPLE BELOW - it is ineefficient  (runs 2+ hours instead of seconds for the method above)
      SELECT  Bucket_2*1 "Start" , Bucket_2 "End",
      Count(Bucket_2) as b2,
      --same for b3 .. b7
      Count(Bucket_8) as b8

     FROM
     (
         SELECT WIDTH_BUCKET (t2.weight, 0, 0.6, 601) Bucket_2,
                --same for t3,.. t7
                WIDTH_BUCKET (t8.weight, 0, 0.6, 601) Bucket_8

          FROM (select weight from my_link_t where  speedcat = 2) t2,
          -- ..speedcat = 3) t3, .. speedcat = 4) t4, etc
           (select weight from my_link_t where  speedcat = 8 ) t8

     )
     GROUP BY Bucket_2 ORDER BY Bucket_2
     ------

查詢執行幾個小時(執行時間比單個查詢長約 500 倍),直到我殺死它。書籍建議在 SQL 中進行所有數據切片。這個例子表明,在復雜查詢的情況下,將數據載入到 Java 並在那里切片可能會更好。

什麼會導致差異?

簡答

你的 7-way CartesianJOIN將會有一些嚴重的性能問題。

長答案

在集合中思考。

我假設您的數據集需要包含:speedcat、bucket_index、count(*)

該數據集的簡單解決方案很簡單:

select t.speedcat
 , WIDTH_BUCKET (t.weight, 0, 0.6, 601) as bucket_index
 , count(*) N
from my_link_t t
group by t.speedcat, WIDTH_BUCKET (t.weight, 0, 0.6, 601)

這是格式 (x,y,z) 是大多數圖形包的預期格式。

如果您想要“網格”格式的結果,那麼PIVOT結果。

with data as (
 -- same SELECT as previous answer
 select t.speedcat
   , WIDTH_BUCKET (t.weight, 0, 0.6, 601) as bucket_index
   , count(*) N
 from my_link_t t
 group by t.speedcat, WIDTH_BUCKET (t.weight, 0, 0.6, 601)
)
select *
from data
 pivot (
   sum(N)
   for speedcat in ( 2 as "speedcat_2"
                    ,3 as "speedcat_3"
                    ,4 as "speedcat_4"
                    ,5 as "speedcat_5"
                    ,6 as "speedcat_6"
                    ,7 as "speedcat_7"
                    ,8 as "speedcat_8"
                    )
 )

引用自:https://dba.stackexchange.com/questions/223574