Postgresql
為什麼優化器不在我的表上使用聚群索引?
我有這張桌子
Table "public.lineitem" Column | Type | Collation | Nullable | Default -----------------+---------------+-----------+----------+--------- l_orderkey | integer | | | l_partkey | integer | | | l_suppkey | integer | | | l_linenumber | integer | | | l_quantity | integer | | | l_extendedprice | numeric(12,2) | | | l_discount | numeric(12,2) | | | l_tax | numeric(12,2) | | | l_returnflag | character(1) | | | l_linestatus | character(1) | | | l_shipdate | date | | | l_commitdate | date | | | l_receiptdate | date | | | l_shipinstruct | character(25) | | | l_shipmode | character(10) | | | l_comment | character(44) | | | l_partsuppkey | character(20) | | | Indexes: "l_shipdate_c_idx" btree (l_shipdate) CLUSTER "l_shipmode_h_idx" hash (l_shipdate) Foreign-key constraints: "lineitem_l_orderkey_fkey" FOREIGN KEY (l_orderkey) REFERENCES orders(o_orderkey) "lineitem_l_partkey_fkey" FOREIGN KEY (l_partkey) REFERENCES part(p_partkey) "lineitem_l_partsuppkey_fkey" FOREIGN KEY (l_partsuppkey) REFERENCES partsupp(ps_partsuppkey) "lineitem_l_suppkey_fkey" FOREIGN KEY (l_suppkey) REFERENCES supplier(s_suppkey)
這個查詢:
explain analyze select l_returnflag, l_linestatus, sum(l_quantity) as sum_qty, sum(l_extendedprice) as sum_base_price, sum(l_extendedprice*(1 - l_discount)) as sum_disc_price, sum(l_extendedprice*(1 - l_discount)*(1 + l_tax)) as sum_charge, avg(l_quantity) as avg_qty, avg(l_extendedprice) as avg_price, avg(l_discount) as avg_disc, count(*) as count_order from lineitem where l_shipdate<='31/08/1998' GROUP by l_returnflag, l_linestatus ORDER by l_returnflag, l_linestatus
返回此查詢計劃:
"Finalize GroupAggregate (cost=2631562.25..2631564.19 rows=6 width=212) (actual time=28624.012..28624.466 rows=4 loops=1)" " Group Key: l_returnflag, l_linestatus" " -> Gather Merge (cost=2631562.25..2631563.65 rows=12 width=212) (actual time=28623.998..28624.442 rows=12 loops=1)" " Workers Planned: 2" " Workers Launched: 2" " -> Sort (cost=2630562.23..2630562.24 rows=6 width=212) (actual time=28620.633..28620.633 rows=4 loops=3)" " Sort Key: l_returnflag, l_linestatus" " Sort Method: quicksort Memory: 27kB" " Worker 0: Sort Method: quicksort Memory: 27kB" " Worker 1: Sort Method: quicksort Memory: 27kB" " -> Partial HashAggregate (cost=2630562.03..2630562.15 rows=6 width=212) (actual time=28620.607..28620.611 rows=4 loops=3)" " Group Key: l_returnflag, l_linestatus" " Batches: 1 Memory Usage: 24kB" " Worker 0: Batches: 1 Memory Usage: 24kB" " Worker 1: Batches: 1 Memory Usage: 24kB" " -> Parallel Seq Scan on lineitem (cost=0.00..1707452.35 rows=24616258 width=24) (actual time=0.549..19028.353 rows=19701655 loops=3)" " Filter: (l_shipdate <= '1998-08-31'::date)" " Rows Removed by Filter: 293696" "Planning Time: 0.374 ms" "Execution Time: 28624.523 ms"
- 為什麼優化器更喜歡順序掃描
lineitem
而不是使用表l_shipdate_c_idx
?我應該放棄它嗎?Postgres 版本:
PostgreSQL 14.2 on x86_64-apple-darwin20.6.0, compiled by Apple clang version 12.0.0 (clang-1200.0.32.29), 64-bit
你的過濾器
l_shipdate<='31/08/1998'
不是很有選擇性,我們可以從計劃中看到它只負責刪除 293,696 行,最終需要使用 19,701,655。如果要使用索引一一讀取這些行,它可能會比順序掃描表慢得多。
我應該放棄它嗎?
如果這是您正在執行的唯一查詢並且正在使用的唯一過濾器,那麼可能。否則,沒有足夠的資訊繼續下去。如果您想查看某一天的行,該索引可能會很有用。索引中包含一些額外的列可能會更好。不可能在不知道您的應用程序的情況下說。