Oracle

我應該先分組然後加入,還是先加入然後再分組?

  • January 1, 2015

(當只涉及兩個表時,oracle 的優化器對這兩種方法做同樣的事情,因此我的問題涉及三個表)

假設三個表t1t2並且從左到右t3具有成對關係:1:n

t1 (1)--(n) t2 (1)--(n) t3

每個分區在所有連接中使用的公共列上p,每個都有一個主鍵id,外鍵t2.id1 -> t1.idt3.id2 -> t2.id.

實際上,n(兩者)都很小。可能每行約 2t2行,每t1行約 3t3t2

我希望像這樣加入這些表:

select t1.id t1id, t2.id t2id, count(1) t3count
from t1 join t2 on (t1.p = t2.p and t1.id = t2.id1)
       join t3 on (t2.p = t3.p and t2.id = t3.id2)
group by t1.id, t2.id

它給出t3t2行的行數並將t1數據連接到該行數。您可以想像得到以下輸出:

t1id t2id t3count
---- ---- -------
  1    1      3
  1    2      1
  1    3      2
  2    4      5
  3    5      1
  3    6      1

現在將此查詢與:

with t3g as ( -- as in 't3 grouped'
   select id2, count(1) t3count
   from t3
   group by id2
)
select t1.id t1id, t2id, t3count
from t1 join t2  on (t1.p = t2.p  and t1.id = t2.id1)
       join t3g on (t2.p = t3g.p and t2.id = id2)

(後者可以用嵌套select代替with)。

在這裡,我基本上是預先對t3錶進行分組,然後將其與其他兩個表連接起來。

第一個選擇在 oracle 中為我提供了以下計劃:

------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      | 69578 |  5367K|       |  2363   (1)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ALL  |      | 69578 |  5367K|       |  2363   (1)| 00:00:01 |     1 |1048575|
|   2 |   HASH GROUP BY       |      | 69578 |  5367K|  6336K|  2363   (1)| 00:00:01 |       |       |
|*  3 |    HASH JOIN          |      | 69578 |  5367K|       |  1083   (2)| 00:00:01 |       |       |
|   4 |     TABLE ACCESS FULL | t2   | 81635 |  1275K|       |   256   (1)| 00:00:01 |     1 |1048575|
|*  5 |     HASH JOIN         |      | 70033 |  4308K|       |   823   (1)| 00:00:01 |       |       |
|   6 |      TABLE ACCESS FULL| t1   | 81622 |  1753K|       |   616   (1)| 00:00:01 |     1 |1048575|
|   7 |      TABLE ACCESS FULL| t3   | 70033 |  2804K|       |   204   (1)| 00:00:01 |     1 |1048575|
------------------------------------------------------------------------------------------------------

優化器決定先加入 t1 和 t3,否則,不出意外。

第二個查詢給出以下計劃:

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         | 69578 |    21M|       |  3125   (1)| 00:00:01 |       |       |
|*  1 |  HASH JOIN                   |         | 69578 |    21M|  8144K|  3125   (1)| 00:00:01 |       |       |
|   2 |   PART JOIN FILTER CREATE    | :BF0000 | 70033 |  7317K|       |   950   (1)| 00:00:01 |       |       |
|   3 |    VIEW                      |         | 70033 |  7317K|       |   950   (1)| 00:00:01 |       |       |
|   4 |     HASH GROUP BY            |         | 70033 |  2804K|  4424K|   950   (1)| 00:00:01 |       |       |
|   5 |      PARTITION RANGE ALL     |         | 70033 |  2804K|       |   204   (1)| 00:00:01 |     1 |1048575|
|   6 |       TABLE ACCESS FULL      | t3      | 70033 |  2804K|       |   204   (1)| 00:00:01 |     1 |1048575|
|   7 |   PARTITION RANGE JOIN-FILTER|         | 81622 |    17M|       |   876   (1)| 00:00:01 |:BF0000|:BF0000|
|*  8 |    HASH JOIN                 |         | 81622 |    17M|       |   876   (1)| 00:00:01 |       |       |
|   9 |     TABLE ACCESS FULL        | t2      | 81635 |  4623K|       |   256   (1)| 00:00:01 |:BF0000|:BF0000|
|  10 |     TABLE ACCESS FULL        | t1      | 81622 |    12M|       |   616   (1)| 00:00:01 |:BF0000|:BF0000|
----------------------------------------------------------------------------------------------------------------

現在,優化器首先t3按照指示進行分組,然後加入t1t2然後利用分區將分組t3加入到 join 中t1-t2

如果我添加一個 where 子句限制 by t1.id,我會得到以下非常相似的計劃:

首先選擇:

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |      |     1 |    79 |     7  (15)| 00:00:01 |       |       |
|   1 |  HASH GROUP BY                         |      |     1 |    79 |     7  (15)| 00:00:01 |       |       |
|   2 |   NESTED LOOPS                         |      |       |       |            |          |       |       |
|   3 |    NESTED LOOPS                        |      |     1 |    79 |     6   (0)| 00:00:01 |       |       |
|   4 |     NESTED LOOPS                       |      |     1 |    63 |     5   (0)| 00:00:01 |       |       |
|   5 |      TABLE ACCESS BY GLOBAL INDEX ROWID| t1   |     1 |    22 |     3   (0)| 00:00:01 | ROWID | ROWID |
|*  6 |       INDEX RANGE SCAN                 | t1pk |     1 |       |     2   (0)| 00:00:01 |       |       |
|   7 |      PARTITION RANGE ITERATOR          |      |     1 |    41 |     2   (0)| 00:00:01 |   KEY |   KEY |
|*  8 |       TABLE ACCESS BY LOCAL INDEX ROWID| t3   |     1 |    41 |     2   (0)| 00:00:01 |   KEY |   KEY |
|*  9 |        INDEX RANGE SCAN                | t3fk |     1 |       |     1   (0)| 00:00:01 |   KEY |   KEY |
|* 10 |     INDEX UNIQUE SCAN                  | t2pk |     1 |       |     0   (0)| 00:00:01 |       |       |
|* 11 |    TABLE ACCESS BY GLOBAL INDEX ROWID  | t2   |     1 |    16 |     1   (0)| 00:00:01 | ROWID | ROWID |
---------------------------------------------------------------------------------------------------------------

二選一:

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |      |     1 |   311 |     7  (15)| 00:00:01 |       |       |
|   1 |  NESTED LOOPS                            |      |       |       |            |          |       |       |
|   2 |   NESTED LOOPS                           |      |     1 |   311 |     7  (15)| 00:00:01 |       |       |
|   3 |    NESTED LOOPS                          |      |     1 |   253 |     6  (17)| 00:00:01 |       |       |
|   4 |     TABLE ACCESS BY GLOBAL INDEX ROWID   | t1   |     1 |   164 |     3   (0)| 00:00:01 | ROWID | ROWID |
|*  5 |      INDEX RANGE SCAN                    | t1pk |     1 |       |     2   (0)| 00:00:01 |       |       |
|   6 |     VIEW PUSHED PREDICATE                |      |     1 |    89 |     3  (34)| 00:00:01 |       |       |
|   7 |      SORT GROUP BY                       |      |     1 |    41 |     3  (34)| 00:00:01 |       |       |
|*  8 |       FILTER                             |      |       |       |            |          |       |       |
|   9 |        PARTITION RANGE SINGLE            |      |     1 |    41 |     2   (0)| 00:00:01 |   KEY |   KEY |
|* 10 |         TABLE ACCESS BY LOCAL INDEX ROWID| t3   |     1 |    41 |     2   (0)| 00:00:01 |   KEY |   KEY |
|* 11 |          INDEX RANGE SCAN                | t3fk |     1 |       |     1   (0)| 00:00:01 |   KEY |   KEY |
|* 12 |    INDEX UNIQUE SCAN                     | t2pk |     1 |       |     0   (0)| 00:00:01 |       |       |
|* 13 |   TABLE ACCESS BY GLOBAL INDEX ROWID     | t2   |     1 |    58 |     1   (0)| 00:00:01 | ROWID | ROWID |
-----------------------------------------------------------------------------------------------------------------

雖然我模糊地了解優化器在做什麼,但我無法得出哪個查詢更好的結論。計劃中是否有一些明顯的東西應該​​指示我更喜歡一個查詢而不是另一個查詢,或者它們在性能方面看起來是否相當?

關於限制的注意事項t1.id:為了簡單起見,我簡化了表格的描述以及它們之間的關係。從這個意義上說,過濾 by 似乎t1.id不會有效,t3因為它顯然沒有包含t1.id. 實際上,t2是一個弱實體, 主鍵(t1.id, t2.id)t3外鍵t2都是這樣的, 包括t1.id. t3fk這就是為什麼在上面的最後兩個計劃中有一個索引範圍掃描。

優化器總是試圖盡可能快地減少數據量。如果沒有,您的統計數據可能不好。

您的計劃 1 顯示處理的行數較少,這很好。優化器能夠更快地減少數據量。這些數字可能並不完全正確,但它根據優化器統計資訊為您提供了一個想法。

計劃一:

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |      |     1 |    79 |     7  (15)| 00:00:01 |       |       |
|   1 |  HASH GROUP BY                         |      |     1 |    79 |     7  (15)| 00:00:01 |       |       |
|   2 |   NESTED LOOPS                         |      |       |       |            |          |       |       |
|   3 |    NESTED LOOPS                        |      |     1 |    79 |     6   (0)| 00:00:01 |       |       |
|   4 |     NESTED LOOPS                       |      |     1 |    63 |     5   (0)| 00:00:01 |       |       |
|   5 |      TABLE ACCESS BY GLOBAL INDEX ROWID| t1   |     1 |    22 |     3   (0)| 00:00:01 | ROWID | ROWID |
|*  6 |       INDEX RANGE SCAN                 | t1pk |     1 |       |     2   (0)| 00:00:01 |       |       |
|   7 |      PARTITION RANGE ITERATOR          |      |     1 |    41 |     2   (0)| 00:00:01 |   KEY |   KEY |
|*  8 |       TABLE ACCESS BY LOCAL INDEX ROWID| t3   |     1 |    41 |     2   (0)| 00:00:01 |   KEY |   KEY |
|*  9 |        INDEX RANGE SCAN                | t3fk |     1 |       |     1   (0)| 00:00:01 |   KEY |   KEY |
|* 10 |     INDEX UNIQUE SCAN                  | t2pk |     1 |       |     0   (0)| 00:00:01 |       |       |
|* 11 |    TABLE ACCESS BY GLOBAL INDEX ROWID  | t2   |     1 |    16 |     1   (0)| 00:00:01 | ROWID | ROWID |
---------------------------------------------------------------------------------------------------------------

它表明 Oracle 執行了HASH GROUP BY而計劃 2 執行了SORT GROUP BY

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |      |     1 |   311 |     7  (15)| 00:00:01 |       |       |
|   1 |  NESTED LOOPS                            |      |       |       |            |          |       |       |
|   2 |   NESTED LOOPS                           |      |     1 |   311 |     7  (15)| 00:00:01 |       |       |
|   3 |    NESTED LOOPS                          |      |     1 |   253 |     6  (17)| 00:00:01 |       |       |
|   4 |     TABLE ACCESS BY GLOBAL INDEX ROWID   | t1   |     1 |   164 |     3   (0)| 00:00:01 | ROWID | ROWID |
|*  5 |      INDEX RANGE SCAN                    | t1pk |     1 |       |     2   (0)| 00:00:01 |       |       |
|   6 |     VIEW PUSHED PREDICATE                |      |     1 |    89 |     3  (34)| 00:00:01 |       |       |
|   7 |      SORT GROUP BY                       |      |     1 |    41 |     3  (34)| 00:00:01 |       |       |
|*  8 |       FILTER                             |      |       |       |            |          |       |       |
|   9 |        PARTITION RANGE SINGLE            |      |     1 |    41 |     2   (0)| 00:00:01 |   KEY |   KEY |
|* 10 |         TABLE ACCESS BY LOCAL INDEX ROWID| t3   |     1 |    41 |     2   (0)| 00:00:01 |   KEY |   KEY |
|* 11 |          INDEX RANGE SCAN                | t3fk |     1 |       |     1   (0)| 00:00:01 |   KEY |   KEY |
|* 12 |    INDEX UNIQUE SCAN                     | t2pk |     1 |       |     0   (0)| 00:00:01 |       |       |
|* 13 |   TABLE ACCESS BY GLOBAL INDEX ROWID     | t2   |     1 |    58 |     1   (0)| 00:00:01 | ROWID | ROWID |
-----------------------------------------------------------------------------------------------------------------

**結論:**在大多數情況下,我更喜歡HASH GROUP BY。因此,在計劃 1 中,Oracle 在 GROUP BY 之後做得更好,數據更少。計劃 1 更好的 2 個原因。

我假設這不是將使用此連接星座的唯一 SQL 語句。您必須為每個 SQL 決定哪種方式是最好的。

如果你有類似下面的 SQL,with 子句會好很多:

with t3g as ( -- as in 't3 grouped'
   select id2, count(1) t3count
   from t3
   where id = :1 -- new line !!!!!!!!
   group by id2
)
select t1.id t1id, t2id, t3count
from t1 join t2  on (t1.p = t2.p  and t1.id = t2.id1)
       join t3g on (t2.p = t3g.p and t2.id = id2)

如果在初始查詢中省略分區列,則應該可以完全從索引完成查詢。這應該明顯更快。

鑑於此範例,我將刪除 t1 和 t2,因為鍵列應該可從 t3 獲得。這應該明顯更快。

如果您需要來自 t1 和/或 t2 的數據,請將它們連接到 t3。當您限制 t3.id 而不是 t1.id 時檢查計劃。

沒有限制子句的計劃通常使用全表掃描,這往往是訪問所有行的最快方式。該計劃可能與帶有限制條款的計劃有很大不同。

優化器試圖限制訪問數據的成本。這些規則可能適用:

  • 如果需要訪問超過百分之幾的行,則進行表掃描。
  • 首先選擇最小的結果集。
  • 如果所有列都在索引中,則從索引而不是表訪問數據,.

引用自:https://dba.stackexchange.com/questions/52061