

  • April 27, 2012


  • 輸入數據(數據 ID、會話 ID 和一些具有統計重要性的欄位)
  • 輸入文件(數據 ID 和 blob)
  • 第 1 階段輸出文件(數據 ID 和 blob)
  • 第 2 階段輸出文件(數據 ID 和 blob)
  • 第 1 類結果(數據 ID 和一些布爾值)
  • 2 類結果(數據 ID 和一些整數)
  • 第 3 類結果(數據 ID 和一些整數)

每個表有約 200,000 行。

我還有一個視圖,它基本上將所有這些粘合在一起,以便我可以SELECT使用一堆 ID(通常根據會話 ID 選擇它們)並在一個頁面上查看所有相關數據。


> EXPLAIN ANALYZE SELECT(*) FROM overlay WHERE test_session=12345;

                QUERY PLAN
Merge Right Join  (cost=7.19..74179.49 rows=10 width=305) (actual time=10680.129..10680.494 rows=4 loops=1)
  Merge Cond: (p.data_id =
  ->  Merge Join  (cost=7.19..75077.04 rows=183718 width=234) (actual time=0.192..10434.995 rows=173986 loops=1)
        Merge Cond: (p.data_id = input_file.data_id)
        ->  Merge Join  (cost=7.19..69917.74 rows=183718 width=222) (actual time=0.173..9255.653 rows=173986 loops=1)
              Merge Cond: (p.data_id = stage1_output_file.data_id)
              ->  Merge Join  (cost=5.50..62948.54 rows=183718 width=186) (actual time=0.153..8081.949 rows=173986 loops=1)
                    Merge Cond: (p.data_id = stage2_output_file.data_id)
                    ->  Merge Join  (cost=3.90..55217.36 rows=183723 width=150) (actual time=0.132..6918.814 rows=173986 loops=1)
                          Merge Cond: (p.data_id = stage3_output_file.data_id)
                          ->  Nested Loop  (cost=2.72..47004.01 rows=183723 width=114) (actual time=0.111..5753.105 rows=173986 loops=1)
                                Join Filter: (p.impression =
                                ->  Merge Join  (cost=1.68..30467.90 rows=183723 width=102) (actual time=0.070..2675.733 rows=173986 loops=1)
                                      Merge Cond: (p.data_id = s.data_id)
                                      ->  Merge Join  (cost=1.68..19031.56 rows=183723 width=58) (actual time=0.049..1501.546 rows=173986 loops=1)
                                            Merge Cond: (p.data_id = t.data_id)
                                            ->  Index Scan using Category1_Results_pkey on Category1_Results p  (cost=0.00..7652.17 rows=183723 width=18) (actual time=0.025..315.531 rows=173986 loops=1)
                                            ->  Index Scan using Category3_Results_pkey on Category3_Results t  (cost=0.00..8624.43 rows=183787 width=40) (actual time=0.016..321.460 rows=173986 loops=1)
                                      ->  Index Scan using Category2_Results_pkey on Category2_Results s  (cost=0.00..8681.47 rows=183787 width=44) (actual time=0.014..320.794 rows=173986 loops=1)
                                ->  Materialize  (cost=1.04..1.08 rows=4 width=20) (actual time=0.001..0.007 rows=4 loops=173986)
                                      ->  Seq Scan on Category1_impression_str istr  (cost=0.00..1.04 rows=4 width=20) (actual time=0.005..0.012 rows=4 loops=1)
                          ->  Index Scan using Stage3_Output_file_pkey on Stage3_Output_file stage3  (cost=0.00..8178.35 rows=183871 width=36) (actual time=0.015..317.698 rows=173986 loops=1)
                    ->  Index Scan using analysis_file_pkey on analysis_file Stage2_Output  (cost=0.00..8168.99 rows=183718 width=36) (actual time=0.014..317.776 rows=173986 loops=1)
              ->  Index Scan using Stage1_output_file_pkey on Stage1_output_file stg1  (cost=0.00..8199.07 rows=183856 width=36) (actual time=0.014..321.648 rows=173986 loops=1)
        ->  Index Scan using input_file_pkey on input_file input  (cost=0.00..8618.05 rows=183788 width=36) (actual time=0.014..328.968 rows=173986 loops=1)
  ->  Materialize  (cost=0.00..39.59 rows=10 width=75) (actual time=0.046..0.150 rows=4 loops=1)
        ->  Nested Loop Left Join  (cost=0.00..39.49 rows=10 width=75) (actual time=0.039..0.128 rows=4 loops=1)
              Join Filter: ( = d.input_quality)
              ->  Index Scan using input_data_exists_index on input_data d  (cost=0.00..28.59 rows=10 width=45) (actual time=0.013..0.025 rows=4 loops=1)
                    Index Cond: (test_session = 1040)
              ->  Seq Scan on quality_codes t  (cost=0.00..1.04 rows=4 width=38) (actual time=0.002..0.009 rows=4 loops=4)
Total runtime: 10680.902 ms


SELECT p.data_id, p.x2, istr.str AS impression, input.h, p.x3, p.x3, p.x4, s.x5,
       s.x6, s.x7, s.x8, s.x9, s.x10, s.x11, s.x12, s.x13, s.x14, t.x15,
       t.x16, t.x17, t.x18, t.x19, t.x20, t.x21, t.x22, t.x23, AS input, AS stage1, AS stage2, AS stage3
FROM category1_results p, category1_impression_str istr, input_file input,
    stage1_output_file, stage2_output_file, stage3_output_file, 
    category2_results s, category3_results t
WHERE p.impression = AND p.data_id = input.data_id AND p.data_id = stage1_output_file.data_id
      AND p.data_id = stage2_output_file.data_id AND p.data_id = stage3_output_file.data_id AND p.data_id = s.data_id AND p.data_id = t.data_id;                                  


SELECT d.data_id, d.test_session, d.a, d.b, t.c, d.d, d.e, d.f, r.*
FROM input_data d LEFT JOIN quality_codes t ON = d.input_quality
     LEFT JOIN full_results r ON r.data_id = d.data_id  
WHERE NOT d.deleted;


我在這裡推測,但我猜你對視圖的事實LEFT JOIN使計劃者在加入查詢的第一部分之前從整個視圖計算結果。

嘗試從視圖中內聯查詢並將其設為 aJOIN而不是 LEFT JOIN,以查看規劃器現在是否找到更快的方法:

SELECT d.data_id, d.test_session, d.a, d.b, t.c, d.d, d.e, d.f
    , p.data_id AS p_data_id, p.x2, c.str AS impression, i.h
    , p.x3, p.x3, p.x4
    , s.x5, s.x6, s.x7, s.x8, s.x9, s.x10, s.x11, s.x12, s.x13, s.x14
    , t.x15, t.x16, t.x17, t.x18, t.x19, t.x20, t.x21, t.x22, t.x23
    , AS input
    , AS stage1, AS stage2, AS stage3
FROM   input_data d
JOIN   category1_results        p ON p.data_id = d.data_id
JOIN   input_file               i USING (data_id)
JOIN   stage1_output_file      s1 USING (data_id)
JOIN   stage2_output_file      s2 USING (data_id)
JOIN   stage3_output_file      s3 USING (data_id)
JOIN   category2_results        s USING (data_id)
JOIN   category3_results        t USING (data_id)
JOIN   category1_impression_str c ON p.impression = 
LEFT   JOIN quality_codes       t ON = d.input_quality
WHERE  NOT d.deleted;


如果這會導致執行時間大大加快,您可以嘗試添加缺失的行,原因INNER JOIN如下:

SELECT DISTINCT ON (1,2,3,4,5,6,7,8) *
   ) x
SELECT d.data_id, d.test_session, d.a, d.b, t.c, d.d, d.e, d.f
FROM   input_data d
LEFT   JOIN quality_codes t ON = d.input_quality
WHERE  NOT d.deleted;
ORDER  BY 1,2,3,4,5,6,7,8, 9 NULLS LAST; -- p.data_id is otherwise not null
