是什麼減慢了這個直方圖查詢？

November 27, 2018

我有一個表my_link_t，其中列t.weight類型number和分類speedcat類型整數值從 2 到 8。我想從 min=0、max=0.6、step = 0.001 拆分桶中的數據並建構一個 3D 圖以查看每個類別的權重分佈。

初始數據看起來像

weight speedcat

0.0234 2

0.8643 6 0.1854

（重量在 0 到 0.6 之間以及 speedcat 在 2 到 8 之間的數億個條目）

這些查詢返回正確的結果並在不到一分鐘的時間內完成：

--repeat for each variable. Here we loook for speedcat =8
--It takes seconds to run this query
create table   histogram_tbl_8  as (
 select  ttt."Start" as bucket_index, ttt.hist_row as bin8  --here
 FROM ((
 SELECT  Bucket*1 "Start" , Bucket "End", Count(Bucket) hist_row
 FROM (SELECT WIDTH_BUCKET (weight, 0, 0.6, 601) Bucket FROM my_link_t  where speedcat=8)
     GROUP BY Bucket ORDER BY Bucket ) ttt   )   );

speedcat上述查詢在範圍內重複了七次2..8

--if a bin is empty populate it with zero, don't skip it.
create table histogram_output as (
select  tr.bucket_index,
   CASE
       WHEN 1 &gt; (select count(*) from histogram_tbl_2   htm where htm.bucket_index = tr.bucket_index)   THEN 0
       ELSE (select htm.bin2 from histogram_tbl_2   htm where htm.bucket_index =  tr.bucket_index and rownum = 1)
    END
    as b2,
    --same for b3-b7
     CASE
       WHEN 1 &gt; (select count(*) from histogram_tbl_8   htm where htm.bucket_index = tr.bucket_index)   THEN 0
       ELSE     (select htm.bin8    from histogram_tbl_8   htm where htm.bucket_index =  tr.bucket_index and rownum = 1)
    END
    as b8
    FROM  (SELECT LEVEL as bucket_index, 0 as b2, /* 0 as b3, 0 as b4, 0 as b5, 0 as b6, 0 as b7, */ 0 as b8  FROM DUAL CONNECT BY LEVEL &lt; 600)  tr
    )

最後

     select sum(b2), sum(b3),sum(b4),sum(b5),sum(b6),sum(b7),sum(b8) from histogram_output
      select bucket_index,
             round(b2 * 1000000 / 12921) as  b2, --normalize so that total is 1000000 ppm
             -- repeat for b3-b7
             round(b8 * 1000000 / 6262) as  b8 --normalize so that total is 1000000 ppm
        from histogram_output

我得到一張像

bin_end speedcat_2 speedcat_3 speedcat_4 .. speedcat_8

0.001

0.002 .. 0.599 0.600

顯示此類別和此 bin 中對象的 ppm 現在，當我在

-- DONT USE THE EXAMPLE BELOW - it is ineefficient  (runs 2+ hours instead of seconds for the method above)
      SELECT  Bucket_2*1 "Start" , Bucket_2 "End",
      Count(Bucket_2) as b2,
      --same for b3 .. b7
      Count(Bucket_8) as b8

     FROM
     (
         SELECT WIDTH_BUCKET (t2.weight, 0, 0.6, 601) Bucket_2,
                --same for t3,.. t7
                WIDTH_BUCKET (t8.weight, 0, 0.6, 601) Bucket_8

          FROM (select weight from my_link_t where  speedcat = 2) t2,
          -- ..speedcat = 3) t3, .. speedcat = 4) t4, etc
           (select weight from my_link_t where  speedcat = 8 ) t8

     )
     GROUP BY Bucket_2 ORDER BY Bucket_2
     ------

查詢執行幾個小時（執行時間比單個查詢長約 500 倍），直到我殺死它。書籍建議在 SQL 中進行所有數據切片。這個例子表明，在復雜查詢的情況下，將數據載入到 Java 並在那里切片可能會更好。

什麼會導致差異？

簡答
你的 7-way CartesianJOIN將會有一些嚴重的性能問題。
長答案
在集合中思考。
我假設您的數據集需要包含：speedcat、bucket_index、count(*)
該數據集的簡單解決方案很簡單：
select t.speedcat
 , WIDTH_BUCKET (t.weight, 0, 0.6, 601) as bucket_index
 , count(*) N
from my_link_t t
group by t.speedcat, WIDTH_BUCKET (t.weight, 0, 0.6, 601)
這是格式 (x,y,z) 是大多數圖形包的預期格式。
如果您想要“網格”格式的結果，那麼PIVOT結果。
with data as (
 -- same SELECT as previous answer
 select t.speedcat
   , WIDTH_BUCKET (t.weight, 0, 0.6, 601) as bucket_index
   , count(*) N
 from my_link_t t
 group by t.speedcat, WIDTH_BUCKET (t.weight, 0, 0.6, 601)
)
select *
from data
 pivot (
   sum(N)
   for speedcat in ( 2 as "speedcat_2"
                    ,3 as "speedcat_3"
                    ,4 as "speedcat_4"
                    ,5 as "speedcat_5"
                    ,6 as "speedcat_6"
                    ,7 as "speedcat_7"
                    ,8 as "speedcat_8"
                    )
 )

引用自：https://dba.stackexchange.com/questions/223574

是什麼減慢了這個直方圖查詢？

簡答

長答案

相關問答

如何創建索引以提高在 oracle 中創建表的聚合函式的性能

如何通過 SQL 隔離一般的 Oracle 10g 問題

性能問題：PL/SQL

查詢優化OracleDBM小號○r一個Cl和D乙米小號Oracle DBMS

如何在使用者連接後正好 8 小時終止 Oracle DB 中的使用者會話？

努力提高此程式碼的效率