為 WHERE 和 ORDER BY 創建多列索引

December 5, 2021

我正在嘗試創建一個同時用於 WHERE 和 ORDER BY 子句的索引。閱讀 Postgres 14 文件（11.4. Indexes and ORDER BY - https://www.postgresql.org/docs/14/indexes-ordering.html）讓我相信：
除了簡單地查找要由查詢返回的行之外，索引還可能能夠以特定的排序順序傳遞它們。這允許在沒有單獨的排序步驟的情況下遵守查詢的 ORDER BY 規範。
哇，聽起來很棒，讓我們試試吧！我創建了一個測試表，一個包含 WHERE 和 ORDER BY 列的索引，並用數據填充它：
DROP TABLE IF EXISTS testdata;
CREATE TABLE testdata
(
   question_id   TEXT        NOT NULL UNIQUE PRIMARY KEY,
   answerer_id   TEXT        NOT NULL,
   question_date TIMESTAMPTZ NOT NULL,
   answer_date   TIMESTAMPTZ NOT NULL
);

DROP INDEX IF EXISTS idx1;
CREATE INDEX idx1 ON testdata (answerer_id, answer_date, question_date);

TRUNCATE testdata;
INSERT INTO testdata(question_id, answerer_id, question_date, answer_date)
SELECT CONCAT('question_', LPAD(i::TEXT, 4, '0')),
      CONCAT('answerer_', LPAD(FLOOR(RANDOM() * (99 - 1 + 1) + 1)::TEXT, 2, '0')),
      TIMESTAMPTZ '2021-01-01' + RANDOM() * INTERVAL '365 days',
      TIMESTAMPTZ '2022-01-01' + RANDOM() * INTERVAL '365 days'
FROM GENERATE_SERIES(1, 9999) AS t(i);

VACUUM (FULL, ANALYZE) testdata;

EXPLAIN ANALYSE
SELECT *
FROM testdata
WHERE answerer_id = 'answerer_09'
ORDER BY answer_date,
        question_date;
這是數據的範例。由於answerer_id是從 1 到 99 的隨機數，因此應為此查詢返回 10K 行中的約 100 行（約 10% 的行）：
EXPLAIN ANALYSE的查詢給了我以下資訊：
Sort  (cost=108.49..108.75 rows=106 width=42) (actual time=2.194..3.555 rows=106 loops=1)
 Sort Key: answer_date, question_date"
 Sort Method: quicksort  Memory: 33kB
 -&gt;  Bitmap Heap Scan on testdata  (cost=5.11..104.92 rows=106 width=42) (actual time=0.057..1.188 rows=106 loops=1)
       Recheck Cond: (answerer_id = 'answerer_09'::text)
       Heap Blocks: exact=67
       -&gt;  Bitmap Index Scan on idx1  (cost=0.00..5.08 rows=106 width=0) (actual time=0.032..0.040 rows=106 loops=1)
             Index Cond: (answerer_id = 'answerer_09'::text)
Planning Time: 0.154 ms
Execution Time: 4.856 ms
所以數據庫使用索引來找到滿足 WHERE 子句的行，然後……用快速排序對它們進行排序？為什麼不返回完全像它們在索引中的行，已經排序？
我錯過了什麼嗎？也許我需要以其他方式創建索引才能在 WHERE 和 ORDER BY 中使用？
更新：
將查詢更改為：
EXPLAIN ANALYSE
SELECT *
FROM testdata
WHERE answerer_id = 'answerer_09'
ORDER BY answer_date,
        question_date
LIMIT 30; -- NEW!
徹底改變結果：
Limit  (cost=0.29..83.88 rows=30 width=42) (actual time=0.064..1.599 rows=30 loops=1)
 -&gt;  Index Scan using idx1 on testdata  (cost=0.29..253.87 rows=91 width=42) (actual time=0.044..0.676 rows=30 loops=1)
       Index Cond: (answerer_id = 'answerer_09'::text)
Planning Time: 0.125 ms
Execution Time: 1.967 ms
如果我將限制更改為 40+，它會恢復為使用排序（儘管類型不同：）top-N heapsort：
Limit  (cost=105.95..106.05 rows=40 width=42) (actual time=1.853..3.205 rows=40 loops=1)
 -&gt;  Sort  (cost=105.95..106.17 rows=91 width=42) (actual time=1.837..2.321 rows=40 loops=1)
       Sort Key: answer_date, question_date"
       Sort Method: top-N heapsort  Memory: 30kB
       -&gt;  Bitmap Heap Scan on testdata  (cost=4.99..103.07 rows=91 width=42) (actual time=0.054..1.037 rows=91 loops=1)
             Recheck Cond: (answerer_id = 'answerer_09'::text)
             Heap Blocks: exact=57
             -&gt;  Bitmap Index Scan on idx1  (cost=0.00..4.97 rows=91 width=0) (actual time=0.034..0.042 rows=91 loops=1)
                   Index Cond: (answerer_id = 'answerer_09'::text)
Planning Time: 0.093 ms
Execution Time: 3.618 ms
所以索引是正確的，數據庫知道它，但是當它期望有未定義（無限制）或相當大的限制時忽略它。
這是什麼原因？是因為以某種方式排序而不使用索引更快嗎？

對於大約 10% 的行，執行索引掃描通常效率不高。（這裡有很多因素…）您看到的是點陣圖索引掃描。為什麼？看：
索引掃描時 Postgres 不使用索引是更好的選擇
點陣圖索引掃描不能將索引排序順序傳遞到結果中。因此需要最後的排序步驟。
您可以“禁用”替代查詢計劃以“強制”您的索引掃描（僅用於測試目的！）：
SET enable_bitmapscan = off;
SET enable_seqscan = off;
或者，您可以通過以下方式降低隨機訪問的預期成本：
SET random_page_cost = 1;  -- or similar
或者你可以LIMIT只添加幾個結果行，就像你添加的那樣。
其中任何一個都可以說服查詢計劃程序切換到索引掃描，而無需額外的排序步驟：
Index Scan using idx1 on testdata  (cost=0.29..274.08 rows=104 width=42) (actual time=0.014..0.050 rows=104 loops=1)
 Index Cond: (answerer_id = 'answerer_09'::text)
Planning Time: 0.052 ms
Execution Time: 0.064 ms
db<>在這裡擺弄
對於只有幾行和輕度選擇性謂詞的測試案例，很難判斷順序掃描、點陣圖索引掃描還是索引掃描是否會更快。使用更大表的測試更具啟發性。
無論哪種方式，查詢計劃器都嚴格根據估計值做出決定cost（設置SET enable_seqscan = off只會使順序掃描看起來非常昂貴。）預期最便宜的計劃獲勝。表和列統計資訊、伺服器配置和成本設置應盡可能有效，以獲得有效的估計 - 和良好的查詢計劃。

引用自：https://dba.stackexchange.com/questions/302153

為 WHERE 和 ORDER BY 創建多列索引

相關問答

postgresql 計劃器/優化器的特定索引問題

索引不與 = ANY() 一起使用，但與 IN 一起使用

帶有點陣圖索引掃描的查詢計劃中的“重新檢查條件：”行

臨時表上的索引使用情況

為什麼 BETWEEN 使用 btree 索引但“元素包含在”範圍運算符 (<@) 中不使用？

Postgresql 可以在計劃中利用“意外”集群嗎？