提高討厭的嵌套視圖連接的性能

April 27, 2012

我有一個中等大小的數據庫，分佈在幾個表上，粗略的架構是：

輸入數據（數據 ID、會話 ID 和一些具有統計重要性的欄位）
輸入文件（數據 ID 和 blob）
第 1 階段輸出文件（數據 ID 和 blob）
第 2 階段輸出文件（數據 ID 和 blob）
第 1 類結果（數據 ID 和一些布爾值）
2 類結果（數據 ID 和一些整數）
第 3 類結果（數據 ID 和一些整數）

每個表有約 200,000 行。

我還有一個視圖，它基本上將所有這些粘合在一起，以便我可以SELECT使用一堆 ID（通常根據會話 ID 選擇它們）並在一個頁面上查看所有相關數據。

視圖工作正常，查詢計劃的索引使用率看起來很正常，但結果並不快：

> EXPLAIN ANALYZE SELECT(*) FROM overlay WHERE test_session=12345;

                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Right Join  (cost=7.19..74179.49 rows=10 width=305) (actual time=10680.129..10680.494 rows=4 loops=1)
  Merge Cond: (p.data_id = d.id)
  -&gt;  Merge Join  (cost=7.19..75077.04 rows=183718 width=234) (actual time=0.192..10434.995 rows=173986 loops=1)
        Merge Cond: (p.data_id = input_file.data_id)
        -&gt;  Merge Join  (cost=7.19..69917.74 rows=183718 width=222) (actual time=0.173..9255.653 rows=173986 loops=1)
              Merge Cond: (p.data_id = stage1_output_file.data_id)
              -&gt;  Merge Join  (cost=5.50..62948.54 rows=183718 width=186) (actual time=0.153..8081.949 rows=173986 loops=1)
                    Merge Cond: (p.data_id = stage2_output_file.data_id)
                    -&gt;  Merge Join  (cost=3.90..55217.36 rows=183723 width=150) (actual time=0.132..6918.814 rows=173986 loops=1)
                          Merge Cond: (p.data_id = stage3_output_file.data_id)
                          -&gt;  Nested Loop  (cost=2.72..47004.01 rows=183723 width=114) (actual time=0.111..5753.105 rows=173986 loops=1)
                                Join Filter: (p.impression = istr.id)
                                -&gt;  Merge Join  (cost=1.68..30467.90 rows=183723 width=102) (actual time=0.070..2675.733 rows=173986 loops=1)
                                      Merge Cond: (p.data_id = s.data_id)
                                      -&gt;  Merge Join  (cost=1.68..19031.56 rows=183723 width=58) (actual time=0.049..1501.546 rows=173986 loops=1)
                                            Merge Cond: (p.data_id = t.data_id)
                                            -&gt;  Index Scan using Category1_Results_pkey on Category1_Results p  (cost=0.00..7652.17 rows=183723 width=18) (actual time=0.025..315.531 rows=173986 loops=1)
                                            -&gt;  Index Scan using Category3_Results_pkey on Category3_Results t  (cost=0.00..8624.43 rows=183787 width=40) (actual time=0.016..321.460 rows=173986 loops=1)
                                      -&gt;  Index Scan using Category2_Results_pkey on Category2_Results s  (cost=0.00..8681.47 rows=183787 width=44) (actual time=0.014..320.794 rows=173986 loops=1)
                                -&gt;  Materialize  (cost=1.04..1.08 rows=4 width=20) (actual time=0.001..0.007 rows=4 loops=173986)
                                      -&gt;  Seq Scan on Category1_impression_str istr  (cost=0.00..1.04 rows=4 width=20) (actual time=0.005..0.012 rows=4 loops=1)
                          -&gt;  Index Scan using Stage3_Output_file_pkey on Stage3_Output_file stage3  (cost=0.00..8178.35 rows=183871 width=36) (actual time=0.015..317.698 rows=173986 loops=1)
                    -&gt;  Index Scan using analysis_file_pkey on analysis_file Stage2_Output  (cost=0.00..8168.99 rows=183718 width=36) (actual time=0.014..317.776 rows=173986 loops=1)
              -&gt;  Index Scan using Stage1_output_file_pkey on Stage1_output_file stg1  (cost=0.00..8199.07 rows=183856 width=36) (actual time=0.014..321.648 rows=173986 loops=1)
        -&gt;  Index Scan using input_file_pkey on input_file input  (cost=0.00..8618.05 rows=183788 width=36) (actual time=0.014..328.968 rows=173986 loops=1)
  -&gt;  Materialize  (cost=0.00..39.59 rows=10 width=75) (actual time=0.046..0.150 rows=4 loops=1)
        -&gt;  Nested Loop Left Join  (cost=0.00..39.49 rows=10 width=75) (actual time=0.039..0.128 rows=4 loops=1)
              Join Filter: (t.id = d.input_quality)
              -&gt;  Index Scan using input_data_exists_index on input_data d  (cost=0.00..28.59 rows=10 width=45) (actual time=0.013..0.025 rows=4 loops=1)
                    Index Cond: (test_session = 1040)
              -&gt;  Seq Scan on quality_codes t  (cost=0.00..1.04 rows=4 width=38) (actual time=0.002..0.009 rows=4 loops=4)
Total runtime: 10680.902 ms

其基礎視圖是我們的“完整結果”視圖，定義為：

SELECT p.data_id, p.x2, istr.str AS impression, input.h, p.x3, p.x3, p.x4, s.x5,
       s.x6, s.x7, s.x8, s.x9, s.x10, s.x11, s.x12, s.x13, s.x14, t.x15,
       t.x16, t.x17, t.x18, t.x19, t.x20, t.x21, t.x22, t.x23,
       input.data AS input, stage1_output_file.data AS stage1, 
       stage2_output_file.data AS stage2, stage3_output_file.data AS stage3
FROM category1_results p, category1_impression_str istr, input_file input,
    stage1_output_file, stage2_output_file, stage3_output_file, 
    category2_results s, category3_results t
WHERE p.impression = istr.id AND p.data_id = input.data_id AND p.data_id = stage1_output_file.data_id
      AND p.data_id = stage2_output_file.data_id AND p.data_id = stage3_output_file.data_id AND p.data_id = s.data_id AND p.data_id = t.data_id;

以及生成上述查詢計劃的覆蓋視圖，定義為：

SELECT d.data_id, d.test_session, d.a, d.b, t.c, d.d, d.e, d.f, r.*
FROM input_data d LEFT JOIN quality_codes t ON t.id = d.input_quality
     LEFT JOIN full_results r ON r.data_id = d.data_id  
WHERE NOT d.deleted;

我們似乎在整個鏈條中的大部分時間都在加入我們的整個數據集，我非常確信這是我們的性能問題——有人對優化這隻豬的方法有什麼建議嗎？

我在這裡推測，但我猜你對視圖的事實LEFT JOIN使計劃者在加入查詢的第一部分之前從整個視圖計算結果。

嘗試從視圖中內聯查詢並將其設為 aJOIN而不是 LEFT JOIN，以查看規劃器現在是否找到更快的方法：

SELECT d.data_id, d.test_session, d.a, d.b, t.c, d.d, d.e, d.f
    , p.data_id AS p_data_id, p.x2, c.str AS impression, i.h
    , p.x3, p.x3, p.x4
    , s.x5, s.x6, s.x7, s.x8, s.x9, s.x10, s.x11, s.x12, s.x13, s.x14
    , t.x15, t.x16, t.x17, t.x18, t.x19, t.x20, t.x21, t.x22, t.x23
    , i.data AS input
    , s1.data AS stage1, s2.data AS stage2, s3.data AS stage3
FROM   input_data d
JOIN   category1_results        p ON p.data_id = d.data_id
JOIN   input_file               i USING (data_id)
JOIN   stage1_output_file      s1 USING (data_id)
JOIN   stage2_output_file      s2 USING (data_id)
JOIN   stage3_output_file      s3 USING (data_id)
JOIN   category2_results        s USING (data_id)
JOIN   category3_results        t USING (data_id)
JOIN   category1_impression_str c ON p.impression = c.id 
LEFT   JOIN quality_codes       t ON t.id = d.input_quality
WHERE  NOT d.deleted;

我清理了您的語法以使其更易於管理。為第二data_id列添加了別名，因此它可以執行。

如果這會導致執行時間大大加快，您可以嘗試添加缺失的行，原因INNER JOIN如下：

SELECT DISTINCT ON (1,2,3,4,5,6,7,8) *
FROM (
   &lt;&lt;query&gt;&gt;
   ) x
UNION ALL
SELECT d.data_id, d.test_session, d.a, d.b, t.c, d.d, d.e, d.f
     ,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL
     ,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL
     ,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL
FROM   input_data d
LEFT   JOIN quality_codes t ON t.id = d.input_quality
WHERE  NOT d.deleted;
ORDER  BY 1,2,3,4,5,6,7,8, 9 NULLS LAST; -- p.data_id is otherwise not null

引用自：https://dba.stackexchange.com/questions/17126

提高討厭的嵌套視圖連接的性能

相關問答

如何使用反連接加速查詢

優化將表的內部連接與其自身進行比較的查詢

PostgreSQL 未在 FULL OUTER JOIN 中使用索引

應該使用哪些索引來優化 JOIN 深度為 2 的 PostgreSQL 查詢？

慢速選擇視圖

使用 EAV 結構視圖優化查詢