Postgresql

加快查詢計算。我在哪裡可以添加索引或優化查詢或伺服器?

  • March 3, 2017

我希望加快單個表上的一些計算。

這是這張表,我相信它有超過 9300 萬行並且每天都在增長:

CREATE TABLE daily_data
(
 id serial NOT NULL,
 company_id integer NOT NULL,
 trade_date date NOT NULL,
 daily_val numeric NOT NULL,
 bbg_pulls_id integer,
 gen_qtr_end_dt_id integer,
 ern_release_date_id integer,
 wh_calc_id integer,
 CONSTRAINT daily_data_pkey PRIMARY KEY (id),
 CONSTRAINT daily_data_bbg_pulls_id_fkey FOREIGN KEY (bbg_pulls_id)
     REFERENCES bbg_pulls (id) MATCH SIMPLE
     ON UPDATE NO ACTION ON DELETE NO ACTION,
 CONSTRAINT daily_data_company_id_fkey FOREIGN KEY (company_id)
     REFERENCES company (id) MATCH SIMPLE
     ON UPDATE NO ACTION ON DELETE NO ACTION,
 CONSTRAINT daily_data_ern_release_date_id_fkey FOREIGN KEY (ern_release_date_id)
     REFERENCES ern_dt (id) MATCH SIMPLE
     ON UPDATE NO ACTION ON DELETE NO ACTION,
 CONSTRAINT daily_data_wh_calc_id_fkey FOREIGN KEY (wh_calc_id)
     REFERENCES wh_calc (id) MATCH SIMPLE
     ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY IMMEDIATE,
 CONSTRAINT daily_data_company_id_trade_date_bbg_pulls_id_key UNIQUE (company_id, trade_date, bbg_pulls_id),
 CONSTRAINT daily_data_company_id_trade_date_wh_calc_id_key UNIQUE (company_id, trade_date, wh_calc_id),
 CONSTRAINT daily_data_check CHECK ((wh_calc_id IS NULL) <> (bbg_pulls_id IS NULL))
)

CREATE INDEX daily_data_bbg_pulls_id_idx
 ON daily_data
 USING btree
 (bbg_pulls_id)
 WHERE bbg_pulls_id IS NOT NULL;

CREATE INDEX daily_data_company_id_idx
 ON daily_data
 USING btree
 (company_id);

CREATE INDEX daily_data_gen_qtr_end_dt_id_idx
 ON daily_data
 USING btree
 (gen_qtr_end_dt_id)
 WHERE gen_qtr_end_dt_id IS NOT NULL;

CREATE INDEX daily_data_trade_date_idx
 ON daily_data
 USING btree
 (trade_date);

CREATE INDEX daily_data_wh_calc_id_idx
 ON daily_data
 USING btree
 (wh_calc_id)
 WHERE wh_calc_id IS NOT NULL;

這是我實際/最終想要做的事情:

with dd2 as (select * from daily_data where wh_calc_id = 241 -- <- the 241 value is passed into a function where this is used
) 
INSERT INTO daily_data (
       company_id
       ,trade_date
       ,daily_val
       ,wh_calc_id
       )
SELECT  d.company_id
   ,d.trade_date
   , round(CASE WHEN x.ct = 0 THEN numeric '1'  
                 ELSE x.ct_lt / x.ct END, 6) AS pctl_calc
       ,1 -- <-- dummy value, value is passed into the function where this query is used
FROM   dd2 d, LATERAL (
  SELECT count(daily_val) AS ct
       , count(daily_val < d.daily_val OR NULL)::numeric As ct_lt
  FROM   dd2
  WHERE  company_id = d.company_id
--   and     company_id < 8
  AND    trade_date < d.trade_date
  and    wh_calc_id = d.wh_calc_id
  -- and     wh_calc_id = 241
  ) x

我正在通過這樣做來測試它(基本上除了插入之外的所有東西),但它在我取消它之前執行了 16 個小時:

with dd2 as (select * from daily_data where wh_calc_id = 241 
) 
SELECT d.*
    , round(CASE WHEN x.ct = 0 THEN numeric '1'  
                 ELSE x.ct_lt / x.ct END, 6) AS pctl_calc
FROM   dd2 d, LATERAL (
  SELECT count(daily_val) AS ct
       , count(daily_val < d.daily_val OR NULL)::numeric As ct_lt
  FROM   dd2
  WHERE  company_id = d.company_id
--   and     company_id < 8
  AND    trade_date < d.trade_date
  and    wh_calc_id = d.wh_calc_id
  -- and     wh_calc_id = 241

  ) x
ORDER  BY company_id, trade_date;

所以我執行一個子集 ( AND company_id < 8) 來得到一個解釋分析,這需要不到 3 分鐘:解釋分析

with dd2 as (select * from daily_data where wh_calc_id = 241 AND company_id < 8 
) 
SELECT d.*
    , round(CASE WHEN x.ct = 0 THEN numeric '1'  
                 ELSE x.ct_lt / x.ct END, 6) AS pctl_calc
FROM   dd2 d, LATERAL (
  SELECT count(daily_val) AS ct
       , count(daily_val < d.daily_val OR NULL)::numeric As ct_lt
  FROM   dd2
  WHERE  company_id = d.company_id
  --   and       company_id < 8
  AND    trade_date < d.trade_date
  and    wh_calc_id = d.wh_calc_id
  -- and     wh_calc_id = 241

  ) x
ORDER  BY company_id, trade_date;

這是解釋分析輸出:

"Sort  (cost=8.56..8.57 rows=1 width=100) (actual time=219363.049..219367.217 rows=24444 loops=1)"
"  Sort Key: d.company_id, d.trade_date"
"  Sort Method: external merge  Disk: 1264kB"
"  CTE dd2"
"    ->  Index Scan using daily_data_wh_calc_id_idx on daily_data  (cost=0.43..8.46 rows=1 width=34) (actual time=0.415..70805.295 rows=24444 loops=1)"
"          Index Cond: (wh_calc_id = 241)"
"          Filter: (company_id < 8)"
"          Rows Removed by Filter: 8661143"
"  ->  Nested Loop  (cost=0.04..0.10 rows=1 width=100) (actual time=70835.311..219272.273 rows=24444 loops=1)"
"        ->  CTE Scan on dd2 d  (cost=0.00..0.02 rows=1 width=60) (actual time=0.423..64.374 rows=24444 loops=1)"
"        ->  Aggregate  (cost=0.04..0.05 rows=1 width=32) (actual time=8.965..8.965 rows=1 loops=24444)"
"              ->  CTE Scan on dd2  (cost=0.00..0.03 rows=1 width=32) (actual time=5.201..8.203 rows=2422 loops=24444)"
"                    Filter: ((trade_date < d.trade_date) AND (company_id = d.company_id) AND (wh_calc_id = d.wh_calc_id))"
"                    Rows Removed by Filter: 22022"
"Total runtime: 219374.219 ms"

注意我使用 CTE 並讓事情執行得更快。即使使用子集測試查詢,我也會遇到記憶體錯誤和/或事情會永遠執行。最終所有這些都將發生在一個函式中,因此這裡硬編碼的值實際上被傳遞到函式中,但我認為這與問題無關。

有關該特定查詢/計算的更多資訊,請訪問此處: 在 PostgreSQL 9.3.5 中採用排序參數(或相同功能)的百分比排名

我去這裡試圖弄清楚自己無濟於事(為我的鞋子提供): https ://explain.depesz.com/

我沒有看到可以在哪裡添加索引以加快速度。

問題1:我能做些什麼來加快這個查詢?我只關注插入的選擇部分,但如果您對如何加快插入速度(除了刪除索引)有想法,我會很高興聽到它們。

問題 2:從伺服器資源的角度來看,我可以做些什麼來讓計算/查詢執行得更快?

SELECT version() “x86_64-suse-linux-gnu 上的 PostgreSQL 9.3.5,由 gcc (SUSE Linux) 4.8.3 20140627 編譯

$$ gcc-4_8-branch revision 212064 $$, 64 位” 記憶體 2048 MB,CPU:2 個 vCPU,預置儲存:202.11 GB

我知道我可以刪除一些查詢中沒有用到的索引來加快插入速度。如果需要,我稍後會這樣做。現在我只想讓選擇部分執行得更快。其他索引用於其他重要查詢。

為了測試我創建了小樣本表:

INSERT INTO daily_data (
       company_id
       ,trade_date
       ,daily_val
       ,wh_calc_id
       )
SELECT
       (random() * 10)::int,
       '2010-01-01'::timestamp + (i::text || ' day')::interval,
       random() * 100,
       (random() * 20) + 230
FROM
       generate_series(1, 200000) AS i
;

原始查詢:

EXPLAIN ANALYZE
with dd2 as (select * from daily_data where wh_calc_id = 241 AND company_id < 8 
) 
SELECT d.*
, ct, ct_lt
    , round(CASE WHEN x.ct = 0 THEN numeric '1'  
                 ELSE x.ct_lt / x.ct END, 6) AS pctl_calc
FROM   dd2 d, LATERAL (
  SELECT count(daily_val) AS ct
       , count(daily_val < d.daily_val OR NULL)::numeric As ct_lt
  FROM   dd2
  WHERE  company_id = d.company_id
  --   and       company_id < 8
  AND    trade_date < d.trade_date
  and    wh_calc_id = d.wh_calc_id
  -- and     wh_calc_id = 241

  ) x
ORDER  BY company_id, trade_date;

子查詢版本:

with dd2 as (
       select company_id, trade_date
  , (SELECT count(daily_val) FROM daily_data WHERE company_id = d.company_id AND trade_date < d.trade_date and wh_calc_id = d.wh_calc_id) AS ct
  , (SELECT count(daily_val < d.daily_val OR NULL)::numeric FROM daily_data WHERE company_id = d.company_id AND trade_date < d.trade_date and wh_calc_id = d.wh_calc_id) AS ct_lt
       from daily_data AS d where wh_calc_id = 241 AND company_id < 8 
)
SELECT d.*
    , round(CASE WHEN d.ct = 0 THEN numeric '1'  
                 ELSE d.ct_lt / d.ct END, 6) AS pctl_calc
FROM   dd2 d
ORDER  BY company_id, trade_date;

部分視窗+部分子查詢版本:#

EXPLAIN ANALYZE
with dd2 as (select *,
       count(daily_val) OVER (PARTITION BY company_id, wh_calc_id ORDER BY trade_date) - 1 AS ct,
       (SELECT count(daily_val < d.daily_val OR NULL)::numeric FROM daily_data WHERE company_id = d.company_id AND trade_date < d.trade_date and wh_calc_id = d.wh_calc_id) AS ct_lt
       from daily_data AS d where wh_calc_id = 241 AND company_id < 8 
) 
SELECT d.*
    , round(CASE WHEN ct = 0 THEN numeric '1'  
                 ELSE ct_lt / ct END, 6) AS pctl_calc
FROM   dd2 d
ORDER  BY company_id, trade_date;

純視窗版本(但有些棘手):

EXPLAIN ANALYZE
with dd2 as (select d.company_id, d.trade_date, d.daily_val,
       count(daily_val) OVER w - 1 AS ct,
       array_agg(daily_val) OVER w AS ct_lt2
       from daily_data AS d
       where wh_calc_id = 241 AND company_id < 8 
       WINDOW w AS (PARTITION BY company_id, wh_calc_id ORDER BY trade_date)
) , dd1 AS (  
SELECT d.*
, (SELECT count(*) FILTER (WHERE u < d.daily_val) FROM unnest(ct_lt2) AS u) AS ct_lt

FROM   dd2 d
) SELECT *
    , round(CASE WHEN ct = 0 THEN numeric '1'  
                 ELSE ct_lt / ct END, 6) AS pctl_calc
FROM dd1 AS d
ORDER  BY company_id, trade_date;

原始查詢:

                                                                  QUERY PLAN                                                                    
-------------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=2653.28..2653.84 rows=226 width=132) (actual time=8399.345..8399.886 rows=7540 loops=1)
  Sort Key: d.company_id, d.trade_date
  Sort Method: quicksort  Memory: 1253kB
  CTE dd2
    ->  Bitmap Heap Scan on daily_data  (cost=21.55..1224.03 rows=226 width=60) (actual time=1.026..4.794 rows=7540 loops=1)
          Recheck Cond: (wh_calc_id = 241)
          Filter: (company_id < 8)
          Rows Removed by Filter: 2526
          Heap Blocks: exact=1470
          ->  Bitmap Index Scan on daily_data_wh_calc_id_idx  (cost=0.00..21.50 rows=677 width=0) (actual time=0.848..0.848 rows=10066 loops=1)
                Index Cond: (wh_calc_id = 241)
  ->  Nested Loop  (cost=6.22..1420.41 rows=226 width=132) (actual time=6.196..8391.961 rows=7540 loops=1)
        ->  CTE Scan on dd2 d  (cost=0.00..4.52 rows=226 width=60) (actual time=1.029..1.933 rows=7540 loops=1)
        ->  Aggregate  (cost=6.22..6.24 rows=1 width=40) (actual time=1.111..1.111 rows=1 loops=7540)
              ->  CTE Scan on dd2  (cost=0.00..6.22 rows=1 width=32) (actual time=0.003..0.995 rows=484 loops=7540)
                    Filter: ((trade_date < d.trade_date) AND (company_id = d.company_id) AND (wh_calc_id = d.wh_calc_id))
                    Rows Removed by Filter: 7056
Planning time: 0.345 ms
Execution time: 8400.233 ms
(19 rows)

子查詢版本:

                                                                                             QUERY PLAN                                                                                              
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=9949.69..9950.25 rows=226 width=80) (actual time=10558.505..10559.017 rows=7540 loops=1)
  Sort Key: d.company_id, d.trade_date
  Sort Method: quicksort  Memory: 782kB
  CTE dd2
    ->  Bitmap Heap Scan on daily_data d_1  (cost=21.55..9934.07 rows=226 width=48) (actual time=1.138..10538.523 rows=7540 loops=1)
          Recheck Cond: (wh_calc_id = 241)
          Filter: (company_id < 8)
          Rows Removed by Filter: 2526
          Heap Blocks: exact=1470
          ->  Bitmap Index Scan on daily_data_wh_calc_id_idx  (cost=0.00..21.50 rows=677 width=0) (actual time=0.945..0.945 rows=10066 loops=1)
                Index Cond: (wh_calc_id = 241)
          SubPlan 1
            ->  Aggregate  (cost=19.26..19.27 rows=1 width=8) (actual time=0.722..0.722 rows=1 loops=7540)
                  ->  Index Scan using daily_data_company_id_trade_date_wh_calc_id_key on daily_data  (cost=0.42..19.26 rows=1 width=32) (actual time=0.007..0.671 rows=484 loops=7540)
                        Index Cond: ((company_id = d_1.company_id) AND (trade_date < d_1.trade_date) AND (wh_calc_id = d_1.wh_calc_id))
          SubPlan 2
            ->  Aggregate  (cost=19.26..19.27 rows=1 width=32) (actual time=0.672..0.672 rows=1 loops=7540)
                  ->  Index Scan using daily_data_company_id_trade_date_wh_calc_id_key on daily_data daily_data_1  (cost=0.42..19.26 rows=1 width=32) (actual time=0.006..0.551 rows=484 loops=7540)
                        Index Cond: ((company_id = d_1.company_id) AND (trade_date < d_1.trade_date) AND (wh_calc_id = d_1.wh_calc_id))
  ->  CTE Scan on dd2 d  (cost=0.00..6.78 rows=226 width=80) (actual time=1.142..10552.746 rows=7540 loops=1)
Planning time: 0.248 ms
Execution time: 10559.387 ms
(22 rows)

部分視窗+部分子查詢版本:

                                                                                      QUERY PLAN                                                                                        
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=5609.72..5610.28 rows=226 width=132) (actual time=4931.963..4932.433 rows=7540 loops=1)
  Sort Key: d.company_id, d.trade_date
  Sort Method: quicksort  Memory: 1253kB
  CTE dd2
    ->  WindowAgg  (cost=1232.87..5594.10 rows=226 width=100) (actual time=9.498..4914.892 rows=7540 loops=1)
          ->  Sort  (cost=1232.87..1233.43 rows=226 width=60) (actual time=9.468..11.291 rows=7540 loops=1)
                Sort Key: d_1.company_id, d_1.trade_date
                Sort Method: quicksort  Memory: 782kB
                ->  Bitmap Heap Scan on daily_data d_1  (cost=21.55..1224.03 rows=226 width=60) (actual time=1.002..5.251 rows=7540 loops=1)
                      Recheck Cond: (wh_calc_id = 241)
                      Filter: (company_id < 8)
                      Rows Removed by Filter: 2526
                      Heap Blocks: exact=1470
                      ->  Bitmap Index Scan on daily_data_wh_calc_id_idx  (cost=0.00..21.50 rows=677 width=0) (actual time=0.814..0.814 rows=10066 loops=1)
                            Index Cond: (wh_calc_id = 241)
          SubPlan 1
            ->  Aggregate  (cost=19.26..19.27 rows=1 width=32) (actual time=0.649..0.649 rows=1 loops=7540)
                  ->  Index Scan using daily_data_company_id_trade_date_wh_calc_id_key on daily_data  (cost=0.42..19.26 rows=1 width=32) (actual time=0.005..0.529 rows=484 loops=7540)
                        Index Cond: ((company_id = d_1.company_id) AND (trade_date < d_1.trade_date) AND (wh_calc_id = d_1.wh_calc_id))
  ->  CTE Scan on dd2 d  (cost=0.00..6.78 rows=226 width=132) (actual time=9.504..4928.264 rows=7540 loops=1)
Planning time: 0.223 ms
Execution time: 4932.854 ms
(22 rows)

純視窗版本(但有些棘手):

                                                                        QUERY PLAN                                                                          
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=1601.04..1601.61 rows=226 width=120) (actual time=2006.537..2018.331 rows=7540 loops=1)
  Sort Key: d.company_id, d.trade_date
  Sort Method: quicksort  Memory: 63526kB
  CTE dd2
    ->  WindowAgg  (cost=1232.87..1239.08 rows=226 width=84) (actual time=8.298..86.164 rows=7540 loops=1)
          ->  Sort  (cost=1232.87..1233.43 rows=226 width=44) (actual time=8.285..9.752 rows=7540 loops=1)
                Sort Key: d_1.company_id, d_1.trade_date
                Sort Method: quicksort  Memory: 782kB
                ->  Bitmap Heap Scan on daily_data d_1  (cost=21.55..1224.03 rows=226 width=44) (actual time=0.935..5.078 rows=7540 loops=1)
                      Recheck Cond: (wh_calc_id = 241)
                      Filter: (company_id < 8)
                      Rows Removed by Filter: 2526
                      Heap Blocks: exact=1470
                      ->  Bitmap Index Scan on daily_data_wh_calc_id_idx  (cost=0.00..21.50 rows=677 width=0) (actual time=0.722..0.722 rows=10066 loops=1)
                            Index Cond: (wh_calc_id = 241)
  CTE dd1
    ->  CTE Scan on dd2 d_2  (cost=0.00..346.34 rows=226 width=88) (actual time=8.315..1573.920 rows=7540 loops=1)
          SubPlan 2
            ->  Aggregate  (cost=1.50..1.51 rows=1 width=8) (actual time=0.155..0.156 rows=1 loops=7540)
                  ->  Function Scan on unnest u  (cost=0.00..1.00 rows=100 width=32) (actual time=0.034..0.058 rows=485 loops=7540)
  ->  CTE Scan on dd1 d  (cost=0.00..6.78 rows=226 width=120) (actual time=8.321..1774.450 rows=7540 loops=1)
Planning time: 0.177 ms
Execution time: 2443.855 ms
(23 rows)

結論

Windows版本是最好的。我發現沒有簡單的方法來轉換ct_lt它的計算。作為一種解決方法,我使用了數組中值的聚合併ct_lt從中計數。可能,它會對大量數據造成一些損失。但即便如此,這個版本也會比其他版本更好。

UPS:

適應 9.4:

EXPLAIN ANALYZE
with dd2 as (select d.company_id, d.trade_date, d.daily_val,
       count(daily_val) OVER w - 1 AS ct,
       array_agg(daily_val) OVER w AS ct_lt2
       from daily_data AS d
       where wh_calc_id = 241 AND company_id < 8 
       WINDOW w AS (PARTITION BY company_id, wh_calc_id ORDER BY trade_date)
) , dd1 AS (  
SELECT d.*
, (SELECT count(*) FROM unnest(ct_lt2) AS u WHERE u < d.daily_val) AS ct_lt

FROM   dd2 d
) SELECT *
    , round(CASE WHEN ct = 0 THEN numeric '1'  
                 ELSE ct_lt::numeric / ct END, 6) AS pctl_calc
FROM dd1 AS d
ORDER  BY company_id, trade_date;

指數:

CREATE INDEX daily_data_i1_idx
 ON daily_data
 (wh_calc_id, company_id, trade_date);
                                                                    QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
Sort  (cost=1311.98..1312.82 rows=333 width=120) (actual time=1597.558..1613.481 rows=7407 loops=1)
  Sort Key: d.company_id, d.trade_date
  Sort Method: quicksort  Memory: 61659kB
  CTE dd2
    ->  WindowAgg  (cost=824.34..833.50 rows=333 width=84) (actual time=10.416..87.668 rows=7407 loops=1)
          ->  Sort  (cost=824.34..825.17 rows=333 width=44) (actual time=10.381..11.851 rows=7407 loops=1)
                Sort Key: d_1.company_id, d_1.trade_date
                Sort Method: quicksort  Memory: 771kB
                ->  Bitmap Heap Scan on daily_data d_1  (cost=11.83..810.39 rows=333 width=44) (actual time=2.934..6.548 rows=7407 loops=1)
                      Recheck Cond: ((wh_calc_id = 241) AND (company_id < 8))
                      Heap Blocks: exact=1460
                      ->  Bitmap Index Scan on daily_data_i1_idx  (cost=0.00..11.75 rows=333 width=0) (actual time=2.661..2.661 rows=7407 loops=1)
                            Index Cond: ((wh_calc_id = 241) AND (company_id < 8))
  CTE dd1
    ->  CTE Scan on dd2 d_2  (cost=0.00..454.55 rows=333 width=88) (actual time=10.442..1320.450 rows=7407 loops=1)
          SubPlan 2
            ->  Aggregate  (cost=1.33..1.34 rows=1 width=8) (actual time=0.143..0.143 rows=1 loops=7407)
                  ->  Function Scan on unnest u  (cost=0.00..1.25 rows=33 width=0) (actual time=0.036..0.124 rows=234 loops=7407)
                        Filter: (u < d_2.daily_val)
                        Rows Removed by Filter: 243
  ->  CTE Scan on dd1 d  (cost=0.00..9.99 rows=333 width=120) (actual time=10.450..1468.533 rows=7407 loops=1)
Planning time: 0.471 ms
Execution time: 1960.835 ms

引用自:https://dba.stackexchange.com/questions/165322