對 UNIQUE 索引進行索引掃描以獲取 count(*)

June 12, 2018

我有一個t大約 2300 萬行的表（大小為 4248 MB）。裡面有一列row_id，有一個not null約束。和一個唯一的索引p1。t(row_id)

當我select count(*) from t計算表中的所有行時，規劃器告訴我：

Seq Scan on t  (cost=0.00..686191.06 rows=23176906 width=0)

我本來期望很快Index Only Scan（索引 p1 僅佔用 698 MB - 少 6 倍）。

如果我這樣做SET enable_seqscan = off了，那麼計劃者仍然堅持閱讀表格行：

QUERY PLAN
 -&gt;  Bitmap Heap Scan on t  (cost=210923.32..897114.38 rows=23176906 width=0)
       -&gt;  Bitmap Index Scan on p1  (cost=0.00..205129.09 rows=23176906 width=0)

為什麼在這種情況下忽略唯一索引？有什麼問題？

我正在使用 PostgreSQL 10.4

對於潔淨室測試，我做了以下工作：

create table tmp
(
 row_id      varchar(15) unique not null,
  &lt;10 original cols&gt;
);

insert into tmp (row_id, &lt;10 cols&gt;) select row_id, &lt;10 cols&gt; from t;
commit;
analyze tmp;

set enable_seqscan = on;
explain (analyze, buffers) select count(*) from tmp;
QUERY PLAN
Aggregate  (cost=744070.45..744070.46 rows=1 width=8) (actual time=5631.501..5631.502 rows=1 loops=1)
 Buffers: shared hit=209109 read=245254
 -&gt;  Seq Scan on tmp  (cost=0.00..686128.96 rows=23176596 width=0) (actual time=0.014..3481.967 rows=23176906 loops=1)
       Buffers: shared hit=209109 read=245254
Planning time: 0.064 ms
Execution time: 5631.531 ms


SET enable_seqscan = off;
explain (analyze, buffers) select count(*) from tmp;
QUERY PLAN
Aggregate  (cost=980282.14..980282.15 rows=1 width=8) (actual time=16224.408..16224.408 rows=1 loops=1)
 Buffers: shared hit=26285 read=542015
 -&gt;  Bitmap Heap Scan on tmp  (cost=236211.69..922340.65 rows=23176596 width=0) (actual time=10030.115..14157.288 rows=23176906 loops=1)
       Heap Blocks: exact=454363
       Buffers: shared hit=26285 read=542015
       -&gt;  Bitmap Index Scan on tmp_row_id_key  (cost=0.00..230417.54 rows=23176596 width=0) (actual time=9929.582..9929.582 rows=23176906 loops=1)
             Buffers: shared hit=26285 read=87652
Planning time: 0.051 ms
Execution time: 16229.303 ms

到目前為止沒有並行索引掃描。PostgreSQL 出於某種模糊的原因堅持訪問該表。

至於為什麼你沒有得到一個索引掃描SET enable_seqscan = off，你應該得到一個僅索引掃描。您提供的數據還無法重現您的情況。這當然適用於 PostgreSQL 10.4。我無法談論您自己的案例，在現實世界中您可能無法獲得索引掃描的原因有很多。最終，按照這些構想調試一個問題將得到一個簡單的“計劃員估計”的答案，但需要更多關於您的環境、配置以及有和沒有SET enable_seqscan = off.
樣本數據
BEGIN;
 CREATE TABLE foo ( x int NOT NULL UNIQUE );
 INSERT INTO foo (x) SELECT generate_series(1,1e6);
COMMIT;

ANALYZE foo; -- don't forget to analyze
和seq_scan
test=# EXPLAIN SELECT count(*) FROM foo;
                                     QUERY PLAN                                      
--------------------------------------------------------------------------------------
Finalize Aggregate  (cost=10633.55..10633.56 rows=1 width=8)
  -&gt;  Gather  (cost=10633.33..10633.54 rows=2 width=8)
        Workers Planned: 2
        -&gt;  Partial Aggregate  (cost=9633.33..9633.34 rows=1 width=8)
              -&gt;  Parallel Seq Scan on foo  (cost=0.00..8591.67 rows=416667 width=0)
(5 rows)
請注意，我們正在執行*“Parallel Seq Scan”*
沒有seq_scan
SET enable_seq_scan = off;

test=# EXPLAIN SELECT count(*) FROM foo;
                                                 QUERY PLAN                                                  
--------------------------------------------------------------------------------------------------------------
Finalize Aggregate  (cost=26616.97..26616.98 rows=1 width=8)
  -&gt;  Gather  (cost=26616.76..26616.97 rows=2 width=8)
        Workers Planned: 2
        -&gt;  Partial Aggregate  (cost=25616.76..25616.77 rows=1 width=8)
              -&gt;  Parallel Index Only Scan using foo_x_key on foo  (cost=0.42..24575.09 rows=416667 width=0)
(5 rows)
請注意，我們正在執行“僅並行索引掃描”**

社區維基回答：
您的問題已由“count(*)”解決了嗎？PostgreSQL wiki 頁面中關於僅索引掃描的部分。
對 PostgreSQL 的一個傳統抱怨，通常在將其與 MySQL 進行比較時（至少在使用不使用 MVCC 的 MyIsam 儲存引擎時）是“count() 很慢”。僅索引掃描可*用於滿足這些查詢，而無需任何謂詞來限制返回的行數，也無需通過指定元組應按索引列排序來強制使用索引。然而，在實踐中，這並不是特別可能。
重要的是要認識到規劃器關心的是最小化查詢的總成本。對於數據庫，I/O 成本通常占主導地位。出於這個原因，“count(*) without any predicate”查詢將僅在索引明顯小於其表時使用僅索引掃描。這通常只發生在表的行寬比某些索引寬得多的情況下。
如果無法通過查看可見性圖確定可見性，則僅索引掃描將不得不訪問堆元組。檢查您的情況時，有一個很大的“取決於”。在最佳情況下，您將獲得僅索引掃描。否則，如果必須檢查元組的可見性，順序掃描很快就會成為贏家，因為它沒有首先檢查索引的成本。
你得到一個bitmap_heap_scanwith enable seq_scan=off，所以規劃器的行為與宣傳的一樣：
完全抑制順序掃描是不可能的，但是如果有其他可用的方法，關閉這個變數會阻止規劃器使用一個。
詳細資訊在同一 wiki 頁面的其他部分。

引用自：https://dba.stackexchange.com/questions/209331

對 UNIQUE 索引進行索引掃描以獲取 count(*)

樣本數據

和`seq_scan`

沒有`seq_scan`

相關問答

FROM 子句中的相關函式是否針對每一行執行？

優化對 690,000 行表的昂貴的 GROUP BY / ORDER BY 查詢

PostgreSQL 10 優化慢查詢性能

大表的高效分頁

查詢執行時間過長

如何加快選擇不同的？

對 UNIQUE 索引進行索引掃描以獲取 count(*)

樣本數據

和seq_scan

沒有seq_scan

相關問答

FROM 子句中的相關函式是否針對每一行執行？

優化對 690,000 行表的昂貴的 GROUP BY / ORDER BY 查詢

PostgreSQL 10 優化慢查詢性能

大表的高效分頁

查詢執行時間過長

如何加快選擇不同的？

和`seq_scan`

沒有`seq_scan`