具有大 IN 的 Postgres 查詢,並且在臨時表上加入似乎不起作用
編輯:問題正文中的查詢計劃來自
EXPLAIN
,但正如@jjanes 建議的那樣,EXPLAIN (ANALYZE, BUFFERS)
可能更有用。由於輸出非常大,我將它們上傳到這裡:https ://gist.github.com/vr2262/ab3cfb69ac758b5161e27d9cb77ad05f我有一個查詢,它通過使用…的索引
bigint
列從表中選擇記錄。達到一定數量(7835,碰巧),查詢很快(順序 ID 大約 150 毫秒,隨機 ID 大約 1 秒),但是再添加一個會導致不同的查詢計劃,並且查詢大約需要 150 秒。我四處尋找其他答案,https://dba.stackexchange.com/a/91254(和其他地方)建議的解決方案是將值插入索引臨時表並加入它。但是,這實際上使它變慢了一些。WHERE``IN
這是原始查詢:
SELECT my_table.id AS my_table_id, my_table.joined_table_2_id AS my_table_joined_table_2_id, my_table.big_where_id AS my_table_big_where_id, ST_AsGeoJSON(my_table.geog) AS unrelated_geog, joined_table_1.id AS joined_table_1_id, FROM my_table LEFT OUTER JOIN joined_table_a AS joined_table_1 ON my_table.id = joined_table_1.my_table_id LEFT OUTER JOIN joined_table_b AS joined_table_2 ON joined_table_2.id = my_table.joined_table_2_id WHERE my_table.joined_table_2_id = 1 AND my_table.big_where_id IN (1, 2, 3, ..., 7835);
…以及相關的快速查詢計劃:
Gather (cost=36576.06..15864926.71 rows=44743 width=139) Workers Planned: 2 -> Hash Left Join (cost=35576.06..15859452.41 rows=18643 width=139) Hash Cond: (my_table.joined_table_2_id = joined_table_2.id) -> Nested Loop Left Join (cost=35574.99..15854534.26 rows=18643 width=246) -> Parallel Bitmap Heap Scan on my_table (cost=35574.42..89742.05 rows=2845 width=201) Recheck Cond: ((joined_table_2_id = 1) AND (big_where_id = ANY ('{1,2,3,...}'::bigint[]))) -> Bitmap Index Scan on my_table_joined_table_2_id_big_where_id_key (cost=0.00..35572.71 rows=6829 width=0) Index Cond: ((joined_table_2_id = 1) AND (big_where_id = ANY ('{1,2,3,...}'::bigint[]))) -> Index Scan using ix_joined_table_a_my_table_id on joined_table_a joined_table_1 (cost=0.57..5512.89 rows=2834 width=53) Index Cond: (my_table_id = my_table.id) -> Hash (cost=1.05..1.05 rows=1 width=14) -> Seq Scan on joined_table_b joined_table_2 (cost=0.00..1.05 rows=1 width=14) Filter: (id = 1)
多了一個 值
big_where_id
,查詢計劃變為:Hash Left Join (cost=50982.39..15870462.06 rows=44750 width=139) Hash Cond: (my_table.joined_table_2_id = joined_table_2.id) -> Hash Right Join (cost=50981.33..15858658.19 rows=44750 width=246) Hash Cond: (joined_table_1.my_table_id = my_table.id) -> Seq Scan on joined_table_a joined_table_1 (cost=0.00..14184914.72 rows=618195072 width=53) -> Hash (cost=50895.95..50895.95 rows=6830 width=201) -> Index Scan using my_table_joined_table_2_id_big_where_id_key on my_table (cost=0.57..50895.95 rows=6830 width=201) Index Cond: ((joined_table_2_id = 1) AND (big_where_id = ANY ('{1,2,3,...}'::bigint[]))) -> Hash (cost=1.05..1.05 rows=1 width=14) -> Seq Scan on joined_table_b joined_table_2 (cost=0.00..1.05 rows=1 width=14) Filter: (id = 1)
我嘗試使用這樣的臨時表:
CREATE TEMPORARY TABLE temp_table (id INTEGER PRIMARY KEY); INSERT INTO temp_table (id) SELECT generate_series(1, 7836); SELECT my_table.id AS my_table_id, my_table.joined_table_2_id AS my_table_joined_table_2_id, my_table.big_where_id AS my_table_big_where_id, ST_AsGeoJSON(my_table.geog) AS unrelated_geog, joined_table_1.id AS joined_table_1_id, FROM my_table LEFT OUTER JOIN joined_table_a AS joined_table_1 ON my_table.id = joined_table_1.my_table_id LEFT OUTER JOIN joined_table_b AS joined_table_2 ON joined_table_2.id = my_table.joined_table_2_id JOIN temp_table ON my_table.big_where_id = temp_table.id WHERE my_table.joined_table_2_id = 1;
…但如前所述,它比以前慢了一點。這是查詢計劃(
EXPLAIN
在 上使用SELECT
):Hash Left Join (cost=126858.69..28741416.19 rows=138238 width=139) Hash Cond: (my_table.joined_table_2_id = joined_table_2.id) -> Hash Right Join (cost=126857.60..28706108.24 rows=138238 width=246) Hash Cond: (joined_table_1.my_table_id = my_table.id) -> Seq Scan on joined_table_a joined_table_1 (cost=0.00..14184914.72 rows=618195072 width=53) -> Hash (cost=125995.86..125995.86 rows=21099 width=201) -> Nested Loop (cost=0.57..125995.86 rows=21099 width=201) -> Seq Scan on temp_table (cost=0.00..159.75 rows=11475 width=4) -> Index Scan using ix_my_table_big_where_id on my_table (cost=0.57..10.95 rows=2 width=201) Index Cond: (big_where_id = temp_table.id) -> Hash (cost=1.04..1.04 rows=4 width=14) -> Seq Scan on joined_table_b joined_table_2 (cost=0.00..1.04 rows=4 width=14)
也許臨時表上的常客
JOIN
是不對的?不過,我也沒有運氣嘗試其他聯接。
這看起來像是一個不公平測量的案例。您可能一遍又一遍地執行相同的查詢,只是每次都將另一個元素添加到 IN 列表中。但這意味著“快速”計劃所需的幾乎所有數據都被頻繁使用並且已經被記憶體。如果您在每次執行時更改針對joined_table_2_id 測試的參數(而不是一直使用'1’),或者為每次執行以不同的方式為IN 列表選擇大約7000 個隨機值,而不是僅使用系列1..7NNN,那麼速度會很快計劃還快嗎?
如果即使使用隨機參數它仍然比替代方案快得多,則表明 random_page_cost 的設置比 seq_page_cost 給定的儲存系統高太多。那些(4 和 1)的預設設置通常適用於硬碟驅動器,而不適用於 SSD。