當沒有結果並且指定了 LIMIT 時，SELECT 非常慢

June 20, 2018

我遇到了一個查詢非常慢的問題，因為當最終結果的數量為 0並且指定了子句SELECT時它不使用索引。LIMIT

如果結果數大於 0，則 Postgres 使用索引，結果在 ~ 1ms 內返回。據我所知，這似乎總是正確的。

如果結果數為 0 並且沒有LIMIT使用，Postgres 使用索引並且結果在 ~ 1ms 內返回

如果結果數為 0 並且LIMIT指定了 a，則 Postgres 會進行順序掃描，結果大約需要 13,000 毫秒。

為什麼PostgreSQL在最後一種情況下不使用索引？

基數：

總共約 2100 萬行。

~ 300 萬行WHERE related_id=1

~ 300 萬行WHERE related_id=1 AND platform=p1

2 行WHERE related_id=1 AND platform=p2

0 行WHERE related_id=1 AND platform=p3

~ 800 萬行WHERE platform=p2

Postgres 版本：9.4.6

表架構：

CREATE TYPE platforms AS ENUM ('p1', 'p2', 'p3');

CREATE TABLE mytable (
   id bigint NOT NULL DEFAULT nextval('mytable_sq'::regclass),
   related_id integer NOT NULL,
   platform platforms NOT NULL DEFAULT 'default'::platforms,
   name character varying(200) NOT NULL,
   CONSTRAINT mytable_pkey PRIMARY KEY (id),
   CONSTRAINT mytable_related_id_fkey FOREIGN KEY (related_id)
        REFERENCES related (id)
);

CREATE INDEX related_id__platform__index ON mytable (related_id, platform);
CREATE UNIQUE INDEX some_other_index ON mytable (related_id, lower(name::text));

查詢和計劃：

此查詢返回 0 行：

EXPLAIN ANALYZE
SELECT * FROM mytable
WHERE related_id=1 AND platform='p2'::platforms
LIMIT 20;

Limit  (cost=0.00..14.07 rows=20 width=737) (actual time=12863.465..12863.465 rows=0 loops=1)
   -&gt;  Seq Scan on mytable  (cost=0.00..1492790.47 rows=2122653 width=737) (actual time=12863.464..12863.464 rows=0 loops=1)
         Filter: ((related_id = 1) AND (platform = 'p2'::platforms))
         Rows Removed by Filter: 21076656
Planning time: 3.540 ms
Execution time: 12868.190 ms

此查詢還返回 0 行：

EXPLAIN ANALYZE
SELECT * FROM mytable
WHERE related_id=1 AND platform='p2'::platforms;

Bitmap Heap Scan on mytable  (cost=60533.63..1295799.94 rows=2122653 width=737) (actual time=0.890..0.890 rows=0 loops=1)
Recheck Cond: ((related_id = 1) AND (platform = 'p2'::platforms))
 -&gt;  Bitmap Index Scan on related_id__platform__index  (cost=0.00..60002.97 rows=2122653 width=0) (actual time=0.888..0.888 rows=0 loops=1)
        Index Cond: ((related_id = 1) AND (platform = 'p2'::platforms))
Planning time: 0.827 ms
Execution time: 1.104 ms

此查詢返回 20 行（沒有LIMIT它會超過 200 萬行）：

EXPLAIN ANALYZE
SELECT * FROM mytable
WHERE related_id=1 AND platform='p1'::platforms
LIMIT 20;

Limit  (cost=0.44..70.95 rows=20 width=737) (actual time=0.759..0.995 rows=20 loops=1)
  -&gt;  Index Scan using related_id__platform__index on mytable  (cost=0.44..1217669.26 rows=345388 width=737) (actual time=0.759..0.993 rows=20 loops=1)
        Index Cond: ((related_id = 1) AND (platform = 'p1'::platforms))
Planning time: 5.776 ms
Execution time: 2.476 ms

此查詢返回 2 行：

EXPLAIN ANALYZE
SELECT * FROM mytable
WHERE related_id=1 AND platform='p3'::platforms LIMIT 20;

Limit  (cost=0.44..80.37 rows=20 width=737) (actual time=0.014..0.016 rows=2 loops=1)
  -&gt;  Index Scan using related_id__platform__index on mytable  (cost=0.44..99497.62 rows=24894 width=737) (actual time=0.014..0.015 rows=2 loops=1)
        Index Cond: ((related_id = 1) AND (platform = 'p3'::platforms))
Planning time: 0.972 ms
Execution time: 0.123 ms

Postgres 在估計查詢中謂詞組合的頻率方面做得不好：
SELECT * FROM tbl
WHERE  related_id = 1 AND platform = 'p2'::platforms
LIMIT  20;
您的每個謂詞本身都不是很有選擇性 - Postgres 可以使用這些資訊（“最常見的值”） - 假設您的統計資訊是最新的：
總共約 2100 萬行。
~ 300 萬行WHERE related_id=1
~ 800 萬行WHERE platform=p2
IOW，~ 每 7 行通過第一個過濾器，~ 每 3 行通過第二個過濾器。Postgres 進行（簡單的）數學運算，並期望大約每 20 行符合條件。由於沒有ORDER BY，因此任何20 個符合條件的行都可以。最快的方法應該是順序掃描表並在大約 400 行之後完成 - 只有幾個數據頁，非常便宜。
使用任何索引都會增加一些額外的成本，Postgres 需要掃描索引和表。（例外：僅索引掃描，這對於您的是不可能的SELECT *）。只有在 Postgres 否則必須閱讀足夠多的額外頁面以估計成本更高的情況下，這才有意義。這就是我將如何解釋您看到對 small 的順序掃描LIMIT，但對 big (或沒有) 的點陣圖索引掃描LIMIT。
不幸的是，您的謂詞組合出乎意料地罕見。Postgres 必須掃描整個表才能找到只有 2 個符合條件的行。（無論如何，該指數實際上會便宜得多。）
2 行WHERE related_id=1 AND platform=p2
Postgres無法使用多列中值的***組合頻率。***想一想：收集這樣的統計數據很快就會失控。
對於這種特殊情況，一個非常簡單有效的解決方案：創建部分索引：
CREATE INDEX related_id_1_platform_2_idx ON tbl (id)
WHERE  related_id = 1 AND platform = 'p2'::platforms;
這個超小索引（2 行）不僅會完美匹配您的查詢，而且還會為您的特定組合（條目pg_class.reltuples）提供計數估計。實際的索引列大多與此無關，選擇一個小列，通常最好使其成為 PK。
如果兩個謂詞之一可以更改，則有一種更通用的方法。說，如果related_id = 1是穩定條件，則創建：
CREATE INDEX related_id_1_idx ON tbl (platform)
WHERE  related_id = 1;
索引列再次相關。這可能不足以扭轉局面，因為 Postgres 只收集功能索引的索引列的完整統計資訊（否則它依賴於基礎表的統計資訊）。我提議：
CREATE INDEX related_id_1_func_idx ON tbl ((platform::text::platforms))  -- double parens!
WHERE  related_id = 1;
注意額外的一對括號 - 強制轉換速記的語法必要性。
這個表達式platform::text::platforms實際上並沒有改變任何東西——它讓你enum來迴text轉換。但它使 Postgres 收集關於（假定的）新值的完整統計資訊。
現在，（之後ANALYZE tbl）我們有完整的統計數據，包括最常見的platformfor值related_id = 1。
檢查：
SELECT *
FROM   pg_stats
WHERE  schemaname = 'public'  -- actual schema
AND    tablename  = 'related_id_1_func_idx';  -- actual idx name
Postgres 應該為您的案例選擇索引 - 如果您在查詢中重複相同的表達式。所以：
SELECT ...
WHERE related_id = 1 AND platform::text::platforms = 'p2'::platforms;
有關的：
未使用但影響查詢的索引
關於Postgres 統計中最常見的值：
加速 GROUP BY，HAVING COUNT 查詢

引用自：https://dba.stackexchange.com/questions/132235

當沒有結果並且指定了 LIMIT 時，SELECT 非常慢

相關問答

我們可以為 JSONB 數據類型的鍵/值創建索引嗎？

使用帶有非重音且僅帶有右端萬用字元的 ILIKE

PostgreSQL 規劃器對分佈不均勻的列的錯誤決策

使用大 IN 優化 Postgres 查詢

大表中的慢速索引掃描

使用 GIN 索引位串