加快查詢計算。我在哪裡可以添加索引或優化查詢或伺服器?
我希望加快單個表上的一些計算。
這是這張表,我相信它有超過 9300 萬行並且每天都在增長:
CREATE TABLE daily_data ( id serial NOT NULL, company_id integer NOT NULL, trade_date date NOT NULL, daily_val numeric NOT NULL, bbg_pulls_id integer, gen_qtr_end_dt_id integer, ern_release_date_id integer, wh_calc_id integer, CONSTRAINT daily_data_pkey PRIMARY KEY (id), CONSTRAINT daily_data_bbg_pulls_id_fkey FOREIGN KEY (bbg_pulls_id) REFERENCES bbg_pulls (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT daily_data_company_id_fkey FOREIGN KEY (company_id) REFERENCES company (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT daily_data_ern_release_date_id_fkey FOREIGN KEY (ern_release_date_id) REFERENCES ern_dt (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT daily_data_wh_calc_id_fkey FOREIGN KEY (wh_calc_id) REFERENCES wh_calc (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY IMMEDIATE, CONSTRAINT daily_data_company_id_trade_date_bbg_pulls_id_key UNIQUE (company_id, trade_date, bbg_pulls_id), CONSTRAINT daily_data_company_id_trade_date_wh_calc_id_key UNIQUE (company_id, trade_date, wh_calc_id), CONSTRAINT daily_data_check CHECK ((wh_calc_id IS NULL) <> (bbg_pulls_id IS NULL)) ) CREATE INDEX daily_data_bbg_pulls_id_idx ON daily_data USING btree (bbg_pulls_id) WHERE bbg_pulls_id IS NOT NULL; CREATE INDEX daily_data_company_id_idx ON daily_data USING btree (company_id); CREATE INDEX daily_data_gen_qtr_end_dt_id_idx ON daily_data USING btree (gen_qtr_end_dt_id) WHERE gen_qtr_end_dt_id IS NOT NULL; CREATE INDEX daily_data_trade_date_idx ON daily_data USING btree (trade_date); CREATE INDEX daily_data_wh_calc_id_idx ON daily_data USING btree (wh_calc_id) WHERE wh_calc_id IS NOT NULL;
這是我實際/最終想要做的事情:
with dd2 as (select * from daily_data where wh_calc_id = 241 -- <- the 241 value is passed into a function where this is used ) INSERT INTO daily_data ( company_id ,trade_date ,daily_val ,wh_calc_id ) SELECT d.company_id ,d.trade_date , round(CASE WHEN x.ct = 0 THEN numeric '1' ELSE x.ct_lt / x.ct END, 6) AS pctl_calc ,1 -- <-- dummy value, value is passed into the function where this query is used FROM dd2 d, LATERAL ( SELECT count(daily_val) AS ct , count(daily_val < d.daily_val OR NULL)::numeric As ct_lt FROM dd2 WHERE company_id = d.company_id -- and company_id < 8 AND trade_date < d.trade_date and wh_calc_id = d.wh_calc_id -- and wh_calc_id = 241 ) x
我正在通過這樣做來測試它(基本上除了插入之外的所有東西),但它在我取消它之前執行了 16 個小時:
with dd2 as (select * from daily_data where wh_calc_id = 241 ) SELECT d.* , round(CASE WHEN x.ct = 0 THEN numeric '1' ELSE x.ct_lt / x.ct END, 6) AS pctl_calc FROM dd2 d, LATERAL ( SELECT count(daily_val) AS ct , count(daily_val < d.daily_val OR NULL)::numeric As ct_lt FROM dd2 WHERE company_id = d.company_id -- and company_id < 8 AND trade_date < d.trade_date and wh_calc_id = d.wh_calc_id -- and wh_calc_id = 241 ) x ORDER BY company_id, trade_date;
所以我執行一個子集 (
AND company_id < 8
) 來得到一個解釋分析,這需要不到 3 分鐘:解釋分析with dd2 as (select * from daily_data where wh_calc_id = 241 AND company_id < 8 ) SELECT d.* , round(CASE WHEN x.ct = 0 THEN numeric '1' ELSE x.ct_lt / x.ct END, 6) AS pctl_calc FROM dd2 d, LATERAL ( SELECT count(daily_val) AS ct , count(daily_val < d.daily_val OR NULL)::numeric As ct_lt FROM dd2 WHERE company_id = d.company_id -- and company_id < 8 AND trade_date < d.trade_date and wh_calc_id = d.wh_calc_id -- and wh_calc_id = 241 ) x ORDER BY company_id, trade_date;
這是解釋分析輸出:
"Sort (cost=8.56..8.57 rows=1 width=100) (actual time=219363.049..219367.217 rows=24444 loops=1)" " Sort Key: d.company_id, d.trade_date" " Sort Method: external merge Disk: 1264kB" " CTE dd2" " -> Index Scan using daily_data_wh_calc_id_idx on daily_data (cost=0.43..8.46 rows=1 width=34) (actual time=0.415..70805.295 rows=24444 loops=1)" " Index Cond: (wh_calc_id = 241)" " Filter: (company_id < 8)" " Rows Removed by Filter: 8661143" " -> Nested Loop (cost=0.04..0.10 rows=1 width=100) (actual time=70835.311..219272.273 rows=24444 loops=1)" " -> CTE Scan on dd2 d (cost=0.00..0.02 rows=1 width=60) (actual time=0.423..64.374 rows=24444 loops=1)" " -> Aggregate (cost=0.04..0.05 rows=1 width=32) (actual time=8.965..8.965 rows=1 loops=24444)" " -> CTE Scan on dd2 (cost=0.00..0.03 rows=1 width=32) (actual time=5.201..8.203 rows=2422 loops=24444)" " Filter: ((trade_date < d.trade_date) AND (company_id = d.company_id) AND (wh_calc_id = d.wh_calc_id))" " Rows Removed by Filter: 22022" "Total runtime: 219374.219 ms"
注意我使用 CTE 並讓事情執行得更快。即使使用子集測試查詢,我也會遇到記憶體錯誤和/或事情會永遠執行。最終所有這些都將發生在一個函式中,因此這裡硬編碼的值實際上被傳遞到函式中,但我認為這與問題無關。
有關該特定查詢/計算的更多資訊,請訪問此處: 在 PostgreSQL 9.3.5 中採用排序參數(或相同功能)的百分比排名
我去這裡試圖弄清楚自己無濟於事(為我的鞋子提供): https ://explain.depesz.com/
我沒有看到可以在哪裡添加索引以加快速度。
問題1:我能做些什麼來加快這個查詢?我只關注插入的選擇部分,但如果您對如何加快插入速度(除了刪除索引)有想法,我會很高興聽到它們。
問題 2:從伺服器資源的角度來看,我可以做些什麼來讓計算/查詢執行得更快?
SELECT version() “x86_64-suse-linux-gnu 上的 PostgreSQL 9.3.5,由 gcc (SUSE Linux) 4.8.3 20140627 編譯
$$ gcc-4_8-branch revision 212064 $$, 64 位” 記憶體 2048 MB,CPU:2 個 vCPU,預置儲存:202.11 GB
我知道我可以刪除一些查詢中沒有用到的索引來加快插入速度。如果需要,我稍後會這樣做。現在我只想讓選擇部分執行得更快。其他索引用於其他重要查詢。
為了測試我創建了小樣本表:
INSERT INTO daily_data ( company_id ,trade_date ,daily_val ,wh_calc_id ) SELECT (random() * 10)::int, '2010-01-01'::timestamp + (i::text || ' day')::interval, random() * 100, (random() * 20) + 230 FROM generate_series(1, 200000) AS i ;
原始查詢:
EXPLAIN ANALYZE with dd2 as (select * from daily_data where wh_calc_id = 241 AND company_id < 8 ) SELECT d.* , ct, ct_lt , round(CASE WHEN x.ct = 0 THEN numeric '1' ELSE x.ct_lt / x.ct END, 6) AS pctl_calc FROM dd2 d, LATERAL ( SELECT count(daily_val) AS ct , count(daily_val < d.daily_val OR NULL)::numeric As ct_lt FROM dd2 WHERE company_id = d.company_id -- and company_id < 8 AND trade_date < d.trade_date and wh_calc_id = d.wh_calc_id -- and wh_calc_id = 241 ) x ORDER BY company_id, trade_date;
子查詢版本:
with dd2 as ( select company_id, trade_date , (SELECT count(daily_val) FROM daily_data WHERE company_id = d.company_id AND trade_date < d.trade_date and wh_calc_id = d.wh_calc_id) AS ct , (SELECT count(daily_val < d.daily_val OR NULL)::numeric FROM daily_data WHERE company_id = d.company_id AND trade_date < d.trade_date and wh_calc_id = d.wh_calc_id) AS ct_lt from daily_data AS d where wh_calc_id = 241 AND company_id < 8 ) SELECT d.* , round(CASE WHEN d.ct = 0 THEN numeric '1' ELSE d.ct_lt / d.ct END, 6) AS pctl_calc FROM dd2 d ORDER BY company_id, trade_date;
部分視窗+部分子查詢版本:#
EXPLAIN ANALYZE with dd2 as (select *, count(daily_val) OVER (PARTITION BY company_id, wh_calc_id ORDER BY trade_date) - 1 AS ct, (SELECT count(daily_val < d.daily_val OR NULL)::numeric FROM daily_data WHERE company_id = d.company_id AND trade_date < d.trade_date and wh_calc_id = d.wh_calc_id) AS ct_lt from daily_data AS d where wh_calc_id = 241 AND company_id < 8 ) SELECT d.* , round(CASE WHEN ct = 0 THEN numeric '1' ELSE ct_lt / ct END, 6) AS pctl_calc FROM dd2 d ORDER BY company_id, trade_date;
純視窗版本(但有些棘手):
EXPLAIN ANALYZE with dd2 as (select d.company_id, d.trade_date, d.daily_val, count(daily_val) OVER w - 1 AS ct, array_agg(daily_val) OVER w AS ct_lt2 from daily_data AS d where wh_calc_id = 241 AND company_id < 8 WINDOW w AS (PARTITION BY company_id, wh_calc_id ORDER BY trade_date) ) , dd1 AS ( SELECT d.* , (SELECT count(*) FILTER (WHERE u < d.daily_val) FROM unnest(ct_lt2) AS u) AS ct_lt FROM dd2 d ) SELECT * , round(CASE WHEN ct = 0 THEN numeric '1' ELSE ct_lt / ct END, 6) AS pctl_calc FROM dd1 AS d ORDER BY company_id, trade_date;
原始查詢:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=2653.28..2653.84 rows=226 width=132) (actual time=8399.345..8399.886 rows=7540 loops=1) Sort Key: d.company_id, d.trade_date Sort Method: quicksort Memory: 1253kB CTE dd2 -> Bitmap Heap Scan on daily_data (cost=21.55..1224.03 rows=226 width=60) (actual time=1.026..4.794 rows=7540 loops=1) Recheck Cond: (wh_calc_id = 241) Filter: (company_id < 8) Rows Removed by Filter: 2526 Heap Blocks: exact=1470 -> Bitmap Index Scan on daily_data_wh_calc_id_idx (cost=0.00..21.50 rows=677 width=0) (actual time=0.848..0.848 rows=10066 loops=1) Index Cond: (wh_calc_id = 241) -> Nested Loop (cost=6.22..1420.41 rows=226 width=132) (actual time=6.196..8391.961 rows=7540 loops=1) -> CTE Scan on dd2 d (cost=0.00..4.52 rows=226 width=60) (actual time=1.029..1.933 rows=7540 loops=1) -> Aggregate (cost=6.22..6.24 rows=1 width=40) (actual time=1.111..1.111 rows=1 loops=7540) -> CTE Scan on dd2 (cost=0.00..6.22 rows=1 width=32) (actual time=0.003..0.995 rows=484 loops=7540) Filter: ((trade_date < d.trade_date) AND (company_id = d.company_id) AND (wh_calc_id = d.wh_calc_id)) Rows Removed by Filter: 7056 Planning time: 0.345 ms Execution time: 8400.233 ms (19 rows)
子查詢版本:
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Sort (cost=9949.69..9950.25 rows=226 width=80) (actual time=10558.505..10559.017 rows=7540 loops=1) Sort Key: d.company_id, d.trade_date Sort Method: quicksort Memory: 782kB CTE dd2 -> Bitmap Heap Scan on daily_data d_1 (cost=21.55..9934.07 rows=226 width=48) (actual time=1.138..10538.523 rows=7540 loops=1) Recheck Cond: (wh_calc_id = 241) Filter: (company_id < 8) Rows Removed by Filter: 2526 Heap Blocks: exact=1470 -> Bitmap Index Scan on daily_data_wh_calc_id_idx (cost=0.00..21.50 rows=677 width=0) (actual time=0.945..0.945 rows=10066 loops=1) Index Cond: (wh_calc_id = 241) SubPlan 1 -> Aggregate (cost=19.26..19.27 rows=1 width=8) (actual time=0.722..0.722 rows=1 loops=7540) -> Index Scan using daily_data_company_id_trade_date_wh_calc_id_key on daily_data (cost=0.42..19.26 rows=1 width=32) (actual time=0.007..0.671 rows=484 loops=7540) Index Cond: ((company_id = d_1.company_id) AND (trade_date < d_1.trade_date) AND (wh_calc_id = d_1.wh_calc_id)) SubPlan 2 -> Aggregate (cost=19.26..19.27 rows=1 width=32) (actual time=0.672..0.672 rows=1 loops=7540) -> Index Scan using daily_data_company_id_trade_date_wh_calc_id_key on daily_data daily_data_1 (cost=0.42..19.26 rows=1 width=32) (actual time=0.006..0.551 rows=484 loops=7540) Index Cond: ((company_id = d_1.company_id) AND (trade_date < d_1.trade_date) AND (wh_calc_id = d_1.wh_calc_id)) -> CTE Scan on dd2 d (cost=0.00..6.78 rows=226 width=80) (actual time=1.142..10552.746 rows=7540 loops=1) Planning time: 0.248 ms Execution time: 10559.387 ms (22 rows)
部分視窗+部分子查詢版本:
QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=5609.72..5610.28 rows=226 width=132) (actual time=4931.963..4932.433 rows=7540 loops=1) Sort Key: d.company_id, d.trade_date Sort Method: quicksort Memory: 1253kB CTE dd2 -> WindowAgg (cost=1232.87..5594.10 rows=226 width=100) (actual time=9.498..4914.892 rows=7540 loops=1) -> Sort (cost=1232.87..1233.43 rows=226 width=60) (actual time=9.468..11.291 rows=7540 loops=1) Sort Key: d_1.company_id, d_1.trade_date Sort Method: quicksort Memory: 782kB -> Bitmap Heap Scan on daily_data d_1 (cost=21.55..1224.03 rows=226 width=60) (actual time=1.002..5.251 rows=7540 loops=1) Recheck Cond: (wh_calc_id = 241) Filter: (company_id < 8) Rows Removed by Filter: 2526 Heap Blocks: exact=1470 -> Bitmap Index Scan on daily_data_wh_calc_id_idx (cost=0.00..21.50 rows=677 width=0) (actual time=0.814..0.814 rows=10066 loops=1) Index Cond: (wh_calc_id = 241) SubPlan 1 -> Aggregate (cost=19.26..19.27 rows=1 width=32) (actual time=0.649..0.649 rows=1 loops=7540) -> Index Scan using daily_data_company_id_trade_date_wh_calc_id_key on daily_data (cost=0.42..19.26 rows=1 width=32) (actual time=0.005..0.529 rows=484 loops=7540) Index Cond: ((company_id = d_1.company_id) AND (trade_date < d_1.trade_date) AND (wh_calc_id = d_1.wh_calc_id)) -> CTE Scan on dd2 d (cost=0.00..6.78 rows=226 width=132) (actual time=9.504..4928.264 rows=7540 loops=1) Planning time: 0.223 ms Execution time: 4932.854 ms (22 rows)
純視窗版本(但有些棘手):
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=1601.04..1601.61 rows=226 width=120) (actual time=2006.537..2018.331 rows=7540 loops=1) Sort Key: d.company_id, d.trade_date Sort Method: quicksort Memory: 63526kB CTE dd2 -> WindowAgg (cost=1232.87..1239.08 rows=226 width=84) (actual time=8.298..86.164 rows=7540 loops=1) -> Sort (cost=1232.87..1233.43 rows=226 width=44) (actual time=8.285..9.752 rows=7540 loops=1) Sort Key: d_1.company_id, d_1.trade_date Sort Method: quicksort Memory: 782kB -> Bitmap Heap Scan on daily_data d_1 (cost=21.55..1224.03 rows=226 width=44) (actual time=0.935..5.078 rows=7540 loops=1) Recheck Cond: (wh_calc_id = 241) Filter: (company_id < 8) Rows Removed by Filter: 2526 Heap Blocks: exact=1470 -> Bitmap Index Scan on daily_data_wh_calc_id_idx (cost=0.00..21.50 rows=677 width=0) (actual time=0.722..0.722 rows=10066 loops=1) Index Cond: (wh_calc_id = 241) CTE dd1 -> CTE Scan on dd2 d_2 (cost=0.00..346.34 rows=226 width=88) (actual time=8.315..1573.920 rows=7540 loops=1) SubPlan 2 -> Aggregate (cost=1.50..1.51 rows=1 width=8) (actual time=0.155..0.156 rows=1 loops=7540) -> Function Scan on unnest u (cost=0.00..1.00 rows=100 width=32) (actual time=0.034..0.058 rows=485 loops=7540) -> CTE Scan on dd1 d (cost=0.00..6.78 rows=226 width=120) (actual time=8.321..1774.450 rows=7540 loops=1) Planning time: 0.177 ms Execution time: 2443.855 ms (23 rows)
結論
Windows版本是最好的。我發現沒有簡單的方法來轉換
ct_lt
它的計算。作為一種解決方法,我使用了數組中值的聚合併ct_lt
從中計數。可能,它會對大量數據造成一些損失。但即便如此,這個版本也會比其他版本更好。UPS:
適應 9.4:
EXPLAIN ANALYZE with dd2 as (select d.company_id, d.trade_date, d.daily_val, count(daily_val) OVER w - 1 AS ct, array_agg(daily_val) OVER w AS ct_lt2 from daily_data AS d where wh_calc_id = 241 AND company_id < 8 WINDOW w AS (PARTITION BY company_id, wh_calc_id ORDER BY trade_date) ) , dd1 AS ( SELECT d.* , (SELECT count(*) FROM unnest(ct_lt2) AS u WHERE u < d.daily_val) AS ct_lt FROM dd2 d ) SELECT * , round(CASE WHEN ct = 0 THEN numeric '1' ELSE ct_lt::numeric / ct END, 6) AS pctl_calc FROM dd1 AS d ORDER BY company_id, trade_date;
指數:
CREATE INDEX daily_data_i1_idx ON daily_data (wh_calc_id, company_id, trade_date); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=1311.98..1312.82 rows=333 width=120) (actual time=1597.558..1613.481 rows=7407 loops=1) Sort Key: d.company_id, d.trade_date Sort Method: quicksort Memory: 61659kB CTE dd2 -> WindowAgg (cost=824.34..833.50 rows=333 width=84) (actual time=10.416..87.668 rows=7407 loops=1) -> Sort (cost=824.34..825.17 rows=333 width=44) (actual time=10.381..11.851 rows=7407 loops=1) Sort Key: d_1.company_id, d_1.trade_date Sort Method: quicksort Memory: 771kB -> Bitmap Heap Scan on daily_data d_1 (cost=11.83..810.39 rows=333 width=44) (actual time=2.934..6.548 rows=7407 loops=1) Recheck Cond: ((wh_calc_id = 241) AND (company_id < 8)) Heap Blocks: exact=1460 -> Bitmap Index Scan on daily_data_i1_idx (cost=0.00..11.75 rows=333 width=0) (actual time=2.661..2.661 rows=7407 loops=1) Index Cond: ((wh_calc_id = 241) AND (company_id < 8)) CTE dd1 -> CTE Scan on dd2 d_2 (cost=0.00..454.55 rows=333 width=88) (actual time=10.442..1320.450 rows=7407 loops=1) SubPlan 2 -> Aggregate (cost=1.33..1.34 rows=1 width=8) (actual time=0.143..0.143 rows=1 loops=7407) -> Function Scan on unnest u (cost=0.00..1.25 rows=33 width=0) (actual time=0.036..0.124 rows=234 loops=7407) Filter: (u < d_2.daily_val) Rows Removed by Filter: 243 -> CTE Scan on dd1 d (cost=0.00..9.99 rows=333 width=120) (actual time=10.450..1468.533 rows=7407 loops=1) Planning time: 0.471 ms Execution time: 1960.835 ms