Postgresql 無法使用我的覆蓋索引並退回到更慢的點陣圖掃描
我試圖弄清楚為什麼我的表在索引掃描快得多時使用點陣圖堆掃描。
桌子:
webarchive=# \d web_pages Table "public.web_pages" Column | Type | Modifiers -------------------+-----------------------------+--------------------------------------------------------------------- id | bigint | not null default nextval('web_pages_id_seq'::regclass) state | dlstate_enum | not null errno | integer | url | text | not null starturl | text | not null netloc | text | not null file | bigint | priority | integer | not null distance | integer | not null is_text | boolean | limit_netloc | boolean | title | citext | mimetype | text | type | itemtype_enum | content | text | fetchtime | timestamp without time zone | addtime | timestamp without time zone | normal_fetch_mode | boolean | default true ignoreuntiltime | timestamp without time zone | not null default '1970-01-01 00:00:00'::timestamp without time zone Indexes: "web_pages_pkey" PRIMARY KEY, btree (id) "ix_web_pages_url" UNIQUE, btree (url) "ix_web_pages_distance" btree (distance) "ix_web_pages_fetchtime" btree (fetchtime) "ix_web_pages_id" btree (id) "ix_web_pages_id_state" btree (id, state) "ix_web_pages_netloc" btree (netloc) "ix_web_pages_priority" btree (priority) "ix_web_pages_state" btree (state) "web_pages_netloc_fetchtime_idx" btree (netloc, fetchtime) "web_pages_netloc_id_idx" btree (netloc, id) Foreign-key constraints: "web_pages_file_fkey" FOREIGN KEY (file) REFERENCES web_files(id) Tablespace: "main_webarchive_tablespace"
詢問:
EXPLAIN ANALYZE UPDATE web_pages SET state = 'new' WHERE (state = 'fetching' OR state = 'processing') AND id <= 150000000;
在這種情況下,由於我有一個覆蓋索引 (
ix_web_pages_id_state
),我希望查詢規劃器只進行索引掃描。但是,它會生成一個點陣圖堆掃描,這要慢得多:QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------- Update on web_pages (cost=524.06..532.09 rows=2 width=671) (actual time=2356.900..2356.900 rows=0 loops=1) -> Bitmap Heap Scan on web_pages (cost=524.06..532.09 rows=2 width=671) (actual time=2356.896..2356.896 rows=0 loops=1) Recheck Cond: (((state = 'fetching'::dlstate_enum) OR (state = 'processing'::dlstate_enum)) AND (id <= 150000000)) Heap Blocks: exact=6 -> BitmapAnd (cost=524.06..524.06 rows=2 width=0) (actual time=2353.388..2353.388 rows=0 loops=1) -> BitmapOr (cost=151.98..151.98 rows=6779 width=0) (actual time=2021.635..2021.636 rows=0 loops=1) -> Bitmap Index Scan on ix_web_pages_state (cost=0.00..147.41 rows=6779 width=0) (actual time=2021.616..2021.617 rows=11668131 loops=1) Index Cond: (state = 'fetching'::dlstate_enum) -> Bitmap Index Scan on ix_web_pages_state (cost=0.00..4.57 rows=1 width=0) (actual time=0.015..0.016 rows=0 loops=1) Index Cond: (state = 'processing'::dlstate_enum) -> Bitmap Index Scan on web_pages_pkey (cost=0.00..371.83 rows=16435 width=0) (actual time=0.046..0.047 rows=205 loops=1) Index Cond: (id <= 150000000) Planning time: 0.232 ms Execution time: 2406.234 ms (14 rows)
如果我強制它不進行點陣圖堆掃描(按
set enable_bitmapscan to off;
),它會生成一個更快的計劃:QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Update on web_pages (cost=0.56..38591.75 rows=2 width=671) (actual time=0.284..0.285 rows=0 loops=1) -> Index Scan using web_pages_pkey on web_pages (cost=0.56..38591.75 rows=2 width=671) (actual time=0.281..0.281 rows=0 loops=1) Index Cond: (id <= 150000000) Filter: ((state = 'fetching'::dlstate_enum) OR (state = 'processing'::dlstate_enum)) Rows Removed by Filter: 181 Planning time: 0.190 ms Execution time: 0.334 ms (7 rows)
我重新執行了一次真空分析,以查看表統計資訊是否可能已過時,但這似乎沒有任何好處。此外,以上是在多次重新執行相同的查詢之後,所以我認為記憶體也不應該是相關的。
我怎樣才能誘導計劃者在這裡生成一個更高效的計劃?
編輯:正如評論中所建議的,我添加了一個 index
"ix_web_pages_state_id" btree (state, id)
。不幸的是,它沒有幫助。我還嘗試過降低
random_page_cost
(低至 0.5)以及增加統計目標,但兩者都沒有任何效果。進一步編輯 - 刪除 OR 條件:
EXPLAIN ANALYZE UPDATE web_pages SET state = 'new' WHERE state = 'fetching' AND id <= 150000000;
產量:
QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------- Update on web_pages (cost=311.83..315.84 rows=1 width=589) (actual time=2574.654..2574.655 rows=0 loops=1) -> Bitmap Heap Scan on web_pages (cost=311.83..315.84 rows=1 width=589) (actual time=2574.650..2574.651 rows=0 loops=1) Recheck Cond: ((id <= 150000000) AND (state = 'fetching'::dlstate_enum)) Heap Blocks: exact=6 -> BitmapAnd (cost=311.83..311.83 rows=1 width=0) (actual time=2574.556..2574.556 rows=0 loops=1) -> Bitmap Index Scan on web_pages_pkey (cost=0.00..49.60 rows=1205 width=0) (actual time=0.679..0.680 rows=726 loops=1) Index Cond: (id <= 150000000) -> Bitmap Index Scan on ix_web_pages_state (cost=0.00..261.98 rows=7122 width=0) (actual time=2519.950..2519.951 rows=11873888 loops=1) Index Cond: (state = 'fetching'::dlstate_enum)
進一步編輯 - MOAR WEIRDNESS:
我重寫了查詢以使用子查詢:
EXPLAIN ANALYZE UPDATE web_pages SET state = 'new' WHERE (state = 'fetching' OR state = 'processing') AND id IN ( SELECT id FROM web_pages WHERE id <= 150000000 );
並且產生的執行計劃優於迄今為止的所有其他執行計劃。有時:
QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------- Update on web_pages (cost=1.12..13878.31 rows=1 width=595) (actual time=2.773..2.774 rows=0 loops=1) -> Nested Loop (cost=1.12..13878.31 rows=1 width=595) (actual time=2.772..2.773 rows=0 loops=1) -> Index Scan using web_pages_pkey on web_pages web_pages_1 (cost=0.56..3533.34 rows=1205 width=14) (actual time=0.000..0.602 rows=181 loops=1) Index Cond: (id <= 150000000) -> Index Scan using web_pages_pkey on web_pages (cost=0.56..8.58 rows=1 width=585) (actual time=0.010..0.010 rows=0 loops=181) Index Cond: (id = web_pages_1.id) Filter: ((state = 'fetching'::dlstate_enum) OR (state = 'processing'::dlstate_enum)) Rows Removed by Filter: 1 Planning time: 0.891 ms Execution time: 2.894 ms (10 rows)
Update on web_pages (cost=21193.19..48917.78 rows=2 width=595) -> Hash Semi Join (cost=21193.19..48917.78 rows=2 width=595) Hash Cond: (web_pages.id = web_pages_1.id) -> Bitmap Heap Scan on web_pages (cost=270.14..27976.00 rows=7126 width=585) Recheck Cond: ((state = 'fetching'::dlstate_enum) OR (state = 'processing'::dlstate_enum)) -> BitmapOr (cost=270.14..270.14 rows=7126 width=0) -> Bitmap Index Scan on ix_web_pages_state (cost=0.00..262.01 rows=7126 width=0) Index Cond: (state = 'fetching'::dlstate_enum) -> Bitmap Index Scan on ix_web_pages_state (cost=0.00..4.57 rows=1 width=0) Index Cond: (state = 'processing'::dlstate_enum) -> Hash (cost=20834.15..20834.15 rows=7112 width=14) -> Index Scan using web_pages_pkey on web_pages web_pages_1 (cost=0.56..20834.15 rows=7112 width=14) Index Cond: ((id > 1883250000) AND (id <= 1883300000))
在這一點上,我不知道發生了什麼。我所知道的是每個案例都是由
set enable_bitmapscan to off;
.好的,我昨晚執行的非常長時間執行的事務完成了,我設法
VACUUM VERBOSE ANALYZE
在桌子上執行了一個:webarchive=# VACUUM ANALYZE VERBOSE web_pages; INFO: vacuuming "public.web_pages" INFO: scanned index "ix_web_pages_distance" to remove 33328301 row versions DETAIL: CPU 6.85s/21.21u sec elapsed 171.28 sec INFO: scanned index "ix_web_pages_fetchtime" to remove 33328301 row versions DETAIL: CPU 6.20s/25.28u sec elapsed 186.53 sec INFO: scanned index "ix_web_pages_id" to remove 33328301 row versions DETAIL: CPU 7.37s/29.56u sec elapsed 226.49 sec INFO: scanned index "ix_web_pages_netloc" to remove 33328301 row versions DETAIL: CPU 8.47s/41.44u sec elapsed 260.50 sec INFO: scanned index "ix_web_pages_priority" to remove 33328301 row versions DETAIL: CPU 5.65s/16.35u sec elapsed 180.78 sec INFO: scanned index "ix_web_pages_state" to remove 33328301 row versions DETAIL: CPU 4.51s/21.14u sec elapsed 189.60 sec INFO: scanned index "ix_web_pages_url" to remove 33328301 row versions DETAIL: CPU 26.59s/78.52u sec elapsed 969.99 sec INFO: scanned index "web_pages_netloc_fetchtime_idx" to remove 33328301 row versions DETAIL: CPU 8.23s/48.39u sec elapsed 301.37 sec INFO: scanned index "web_pages_netloc_id_idx" to remove 33328301 row versions DETAIL: CPU 15.52s/43.25u sec elapsed 423.68 sec INFO: scanned index "web_pages_pkey" to remove 33328301 row versions DETAIL: CPU 8.12s/33.43u sec elapsed 215.93 sec INFO: scanned index "ix_web_pages_id_state" to remove 33328301 row versions DETAIL: CPU 8.22s/33.26u sec elapsed 214.43 sec INFO: scanned index "ix_web_pages_state_id" to remove 33328301 row versions DETAIL: CPU 6.01s/28.04u sec elapsed 174.19 sec INFO: "web_pages": removed 33328301 row versions in 3408348 pages DETAIL: CPU 89.90s/50.24u sec elapsed 1928.70 sec INFO: index "ix_web_pages_distance" now contains 29463963 row versions in 215671 pages DETAIL: 33328301 index row versions were removed. 32914 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "ix_web_pages_fetchtime" now contains 29463982 row versions in 253375 pages DETAIL: 33328301 index row versions were removed. 40460 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "ix_web_pages_id" now contains 29464000 row versions in 238212 pages DETAIL: 33328301 index row versions were removed. 21081 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "ix_web_pages_netloc" now contains 29464025 row versions in 358150 pages DETAIL: 33328301 index row versions were removed. 99235 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "ix_web_pages_priority" now contains 29464032 row versions in 214923 pages DETAIL: 33328301 index row versions were removed. 21451 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "ix_web_pages_state" now contains 29466359 row versions in 215150 pages DETAIL: 33328301 index row versions were removed. 81340 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "ix_web_pages_url" now contains 29466350 row versions in 1137027 pages DETAIL: 33197635 index row versions were removed. 236405 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "web_pages_netloc_fetchtime_idx" now contains 29466381 row versions in 539255 pages DETAIL: 33328301 index row versions were removed. 220594 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "web_pages_netloc_id_idx" now contains 29466392 row versions in 501276 pages DETAIL: 33328301 index row versions were removed. 144217 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "web_pages_pkey" now contains 29466394 row versions in 236560 pages DETAIL: 33173411 index row versions were removed. 20559 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "ix_web_pages_id_state" now contains 29466415 row versions in 256699 pages DETAIL: 33328301 index row versions were removed. 27194 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "ix_web_pages_state_id" now contains 29466435 row versions in 244076 pages DETAIL: 33328301 index row versions were removed. 91918 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "web_pages": found 33339704 removable, 29367176 nonremovable row versions in 4224021 out of 4231795 pages DETAIL: 2541 dead row versions cannot be removed yet. There were 2079389 unused item pointers. Skipped 0 pages due to buffer pins. 0 pages are entirely empty. CPU 330.54s/537.34u sec elapsed 7707.90 sec. INFO: vacuuming "pg_toast.pg_toast_705758310" INFO: scanned index "pg_toast_705758310_index" to remove 7184381 row versions DETAIL: CPU 7.32s/13.70u sec elapsed 240.71 sec INFO: "pg_toast_705758310": removed 7184381 row versions in 2271192 pages DETAIL: CPU 62.81s/46.41u sec elapsed 1416.12 sec INFO: index "pg_toast_705758310_index" now contains 114558558 row versions in 338256 pages DETAIL: 7184381 index row versions were removed. 2033 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_705758310": found 7184381 removable, 40907769 nonremovable row versions in 11388831 out of 29033065 pages DETAIL: 5 dead row versions cannot be removed yet. There were 74209 unused item pointers. Skipped 0 pages due to buffer pins. 0 pages are entirely empty. CPU 433.26s/247.73u sec elapsed 8444.85 sec. INFO: analyzing "public.web_pages" INFO: "web_pages": scanned 600000 of 4232727 pages, containing 4191579 live rows and 4552 dead rows; 600000 rows in sample, 29569683 estimated total rows VACUUM
它仍在生成非僅索引查詢,儘管執行時間與僅索引查詢更一致。我不明白為什麼行為發生瞭如此大的變化。執行時間很長的查詢會導致這麼多成本嗎?
webarchive=# EXPLAIN ANALYZE UPDATE web_pages SET state = 'new' WHERE (state = 'fetching' OR state = 'processing') AND id IN ( SELECT id FROM web_pages WHERE id > 1883250000 AND id <= 1883300000 ); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------- Update on web_pages (cost=36.00..9936.00 rows=1 width=594) (actual time=37.856..37.857 rows=0 loops=1) -> Nested Loop Semi Join (cost=36.00..9936.00 rows=1 width=594) (actual time=37.852..37.853 rows=0 loops=1) -> Bitmap Heap Scan on web_pages (cost=35.44..3167.00 rows=788 width=584) (actual time=23.984..31.489 rows=2321 loops=1) Recheck Cond: ((state = 'fetching'::dlstate_enum) OR (state = 'processing'::dlstate_enum)) Heap Blocks: exact=2009 -> BitmapOr (cost=35.44..35.44 rows=788 width=0) (actual time=22.347..22.348 rows=0 loops=1) -> Bitmap Index Scan on ix_web_pages_state (cost=0.00..30.47 rows=788 width=0) (actual time=22.326..22.327 rows=9202 loops=1) Index Cond: (state = 'fetching'::dlstate_enum) -> Bitmap Index Scan on ix_web_pages_state_id (cost=0.00..4.57 rows=1 width=0) (actual time=0.017..0.017 rows=0 loops=1) Index Cond: (state = 'processing'::dlstate_enum) -> Index Scan using ix_web_pages_id_state on web_pages web_pages_1 (cost=0.56..8.58 rows=1 width=14) (actual time=0.001..0.001 rows=0 loops=2321) Index Cond: ((id = web_pages.id) AND (id > 1883250000) AND (id <= 1883300000)) Planning time: 2.677 ms Execution time: 37.945 ms (14 rows)
有趣的是,ID 偏移量的值似乎會影響規劃:
webarchive=# EXPLAIN ANALYZE UPDATE web_pages SET state = 'new' WHERE (state = 'fetching' OR state = 'processing') AND id IN ( SELECT id FROM web_pages WHERE id > 149950000 AND id <= 150000000 ); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------- Update on web_pages (cost=1.12..17.18 rows=1 width=594) (actual time=0.030..0.031 rows=0 loops=1) -> Nested Loop (cost=1.12..17.18 rows=1 width=594) (actual time=0.026..0.028 rows=0 loops=1) -> Index Scan using ix_web_pages_id_state on web_pages web_pages_1 (cost=0.56..8.58 rows=1 width=14) (actual time=0.022..0.024 rows=0 loops=1) Index Cond: ((id > 149950000) AND (id <= 150000000)) -> Index Scan using ix_web_pages_id_state on web_pages (cost=0.56..8.59 rows=1 width=584) (never executed) Index Cond: (id = web_pages_1.id) Filter: ((state = 'fetching'::dlstate_enum) OR (state = 'processing'::dlstate_enum)) Planning time: 1.531 ms Execution time: 0.155 ms (9 rows)
查詢計劃器是否在其計劃中考慮查詢參數的值?我原以為規劃與查詢參數無關,但現在考慮一下,使用參數來改進規劃是有意義的,所以我可以看到它以這種方式工作。
有趣的是,點陣圖掃描現在的性能似乎要高得多。
webarchive=# set enable_bitmapscan to off; SET webarchive=# EXPLAIN ANALYZE UPDATE web_pages SET state = 'new' WHERE (state = 'fetching' OR state = 'processing') AND id IN ( SELECT id FROM web_pages WHERE id > 1883250000 AND id <= 1883300000 ); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- Update on web_pages (cost=1.12..82226.59 rows=1 width=594) (actual time=66.993..66.994 rows=0 loops=1) -> Nested Loop (cost=1.12..82226.59 rows=1 width=594) (actual time=66.992..66.993 rows=0 loops=1) -> Index Scan using web_pages_pkey on web_pages web_pages_1 (cost=0.56..21082.82 rows=7166 width=14) (actual time=0.055..20.206 rows=8567 loops=1) Index Cond: ((id > 1883250000) AND (id <= 1883300000)) -> Index Scan using web_pages_pkey on web_pages (cost=0.56..8.52 rows=1 width=584) (actual time=0.004..0.004 rows=0 loops=8567) Index Cond: (id = web_pages_1.id) Filter: ((state = 'fetching'::dlstate_enum) OR (state = 'processing'::dlstate_enum)) Rows Removed by Filter: 1 Planning time: 1.963 ms Execution time: 67.112 ms (10 rows) webarchive=# set enable_bitmapscan to on; SET webarchive=# EXPLAIN ANALYZE UPDATE web_pages SET state = 'new' WHERE (state = 'fetching' OR state = 'processing') AND id IN ( SELECT id FROM web_pages WHERE id > 1883250000 AND id <= 1883300000 ); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------- Update on web_pages (cost=36.00..9936.00 rows=1 width=594) (actual time=23.331..23.331 rows=0 loops=1) -> Nested Loop Semi Join (cost=36.00..9936.00 rows=1 width=594) (actual time=23.327..23.328 rows=0 loops=1) -> Bitmap Heap Scan on web_pages (cost=35.44..3167.00 rows=788 width=584) (actual time=6.727..17.027 rows=1966 loops=1) Recheck Cond: ((state = 'fetching'::dlstate_enum) OR (state = 'processing'::dlstate_enum)) Heap Blocks: exact=3825 -> BitmapOr (cost=35.44..35.44 rows=788 width=0) (actual time=3.499..3.499 rows=0 loops=1) -> Bitmap Index Scan on ix_web_pages_state (cost=0.00..30.47 rows=788 width=0) (actual time=3.471..3.472 rows=21996 loops=1) Index Cond: (state = 'fetching'::dlstate_enum) -> Bitmap Index Scan on ix_web_pages_state_id (cost=0.00..4.57 rows=1 width=0) (actual time=0.022..0.023 rows=0 loops=1) Index Cond: (state = 'processing'::dlstate_enum) -> Index Scan using ix_web_pages_id_state on web_pages web_pages_1 (cost=0.56..8.58 rows=1 width=14) (actual time=0.001..0.001 rows=0 loops=1966) Index Cond: ((id = web_pages.id) AND (id > 1883250000) AND (id <= 1883300000)) Planning time: 0.774 ms Execution time: 23.425 ms (14 rows)
所以我認為問題只是索引有很多不再有效的行,過濾這些行的過程是主要的時間成本。這裡的根本問題是,(我認為)MVCC 系統與 VACUUM 系統在極長時間執行的事務的上下文中互動的方式。
這是有道理的(回想起來),在每個可以使用該索引的事務都完成之前,不能從索引中刪除條目。從文件中:
但是 PostgreSQL 中的任何表掃描還有一個額外的要求:它必須驗證每個檢索到的行對查詢的 MVCC 快照是“可見的”,如第 13 章所述。可見性資訊不儲存在索引條目中,僅儲存在堆條目中;所以乍一看似乎每行檢索都需要一個堆訪問。如果最近修改了表行,情況確實如此。但是,對於很少更改的數據,有一種方法可以解決這個問題。
在這種情況下,我啟動了一個數據庫轉儲,然後繼續進行了一系列清理工作(這涉及到大量的行流失)。這將導致對每個索引查詢進行大量堆查找,因為索引包含大量現在已刪除的行。
不過,這主要是假設性的,因為我沒有資源來嘗試重現這種情況。
無論如何,@jjanes 關於長時間執行查詢的提示是我在這裡找到兔子洞的關鍵。
這部分顯然是混亂的:
-> Bitmap Index Scan on ix_web_pages_state (cost=0.00..147.41 rows=6779 width=0) (actual time=2021.616..2021.617 rows=11668131 loops=1) Index Cond: (state = 'fetching'::dlstate_enum)
它發現的行數比它想像的多一千倍。真空分析應該已經解決了這個問題。也許你的真空沒有做太多,因為一些長期開放的交易正在阻止它刪除死元組?
進行 VACUUM VERBOSE ANALYZE 可以對此有所了解。
您還可以使用測試查詢進一步探索:
EXPLAIN (ANALYZE,BUFFERS) SELECT COUNT(*) FROM web_pages WHERE state = 'fetching'
嘗試通過擺弄 enable_* 參數使其同時作為點陣圖掃描和正常索引掃描執行,以便我們可以比較兩個報告。特別是,保持打開的事務將導致點陣圖索引掃描節點的實際計數更高,因為它必須在測試可見性之前對行進行計數,而普通索引掃描會在增加計數之前丟棄不可見的行。