日期查詢範圍內未使用的索引

July 21, 2020

我有一個沒有使用現有索引的查詢，我不明白為什麼。

桌子：

mustang=# \d+ bss.amplifier_saturation
                                              Table "bss.amplifier_saturation"
Column |           Type           |                             Modifiers                             | Storage | Description 
--------+--------------------------+-------------------------------------------------------------------+---------+-------------
value  | integer                  | not null                                                          | plain   | 
target | integer                  | not null                                                          | plain   | 
start  | timestamp with time zone | not null                                                          | plain   | 
end    | timestamp with time zone | not null                                                          | plain   | 
id     | integer                  | not null default nextval('amplifier_saturation_id_seq'::regclass) | plain   | 
lddate | timestamp with time zone | not null default now()                                            | plain   | 
Indexes:
   "amplifier_saturation_pkey" PRIMARY KEY, btree (id)
   "amplifier_saturation_target_start_end_key" UNIQUE CONSTRAINT, btree (target, start, "end")
   "amplifier_saturation_end" btree ("end")
   "amplifier_saturation_lddate" btree (lddate)
   "amplifier_saturation_start" btree (start)
   "amplifier_saturation_target" btree (target)
   "amplifier_saturation_value" btree (value)

查詢/計劃：

mustang=# explain select max(lddate) from bss.amplifier_saturation
where start &gt;= '1987-12-31 00:00:00'
and   start &lt;= '1988-04-09 00:00:00';
                                                                       QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Result  (cost=189.41..189.42 rows=1 width=0)
  InitPlan 1 (returns $0)
    -&gt;  Limit  (cost=0.00..189.41 rows=1 width=8)
          -&gt;  Index Scan Backward using amplifier_saturation_lddate on amplifier_saturation  (cost=0.00..2475815.50 rows=13071 width=8)
                Index Cond: (lddate IS NOT NULL)
                Filter: ((start &gt;= '1987-12-31 00:00:00-08'::timestamp with time zone) AND (start &lt;= '1988-04-09 00:00:00-07'::timestamp with time zone))

為什麼這不使用索引amplifier_saturation_start？在我看來，數據庫應該掃描它以找到開始日期，然後繼續分隔所有條目直到結束日期，最後對（小子集）數據進行排序，以獲得最大值lddate（類似於 pp40-41 SQL 性能解釋）。

我也絕望地嘗試了一個索引，(start, start desc)但它沒有幫助。

順便說一句，select count(*)工作得很好：

mustang=# explain select count(*) from bss.amplifier_saturation
where start &gt;= '1987-12-31 00:00:00'
and   start &lt;= '1988-04-09 00:00:00';
                                                                     QUERY PLAN                                                                       
-------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=38711.84..38711.85 rows=1 width=0)
  -&gt;  Index Scan using amplifier_saturation_start on amplifier_saturation  (cost=0.00..38681.47 rows=12146 width=0)
        Index Cond: ((start &gt;= '1987-12-31 00:00:00-08'::timestamp with time zone) AND (start &lt;= '1988-04-09 00:00:00-07'::timestamp with time zone))

跑步ANALYZE沒有幫助。
pg_stats顯示了 start 值的合理分佈，這似乎支持使用索引。
在任一列（開始或lddate）上將統計資訊設置為 10,000 都沒有幫助。

也許我應該解釋為什麼我認為這個計劃是錯誤的。該表包含30,000,000 行。日期範圍內只有 3,500 個。但也許這仍然太多，無法單獨閱讀？

在作品上添加索引(lddate desc, start)（不確定desc是否需要）。然後它可以使用純索引方法 (IIUC) 並且執行得更快：

mustang=# create index tmp_as on bss.amplifier_saturation (lddate desc, start);
CREATE INDEX
mustang=# explain select max(lddate) from bss.amplifier_saturation
where start &gt;= '1987-12-31 00:00:00'
and   start &lt;= '1988-04-09 00:00:00';
                                                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result  (cost=69.76..69.77 rows=1 width=0)
  InitPlan 1 (returns $0)
    -&gt;  Limit  (cost=0.00..69.76 rows=1 width=8)
          -&gt;  Index Scan using tmp_as on amplifier_saturation  (cost=0.00..861900.22 rows=12356 width=8)
                Index Cond: ((lddate IS NOT NULL) AND (start &gt;= '1987-12-31 00:00:00-08'::timestamp with time zone) AND (start &lt;= '1988-04-09 00:00:00-07'::timestamp with time zone))

所以，我想回答我自己的問題，訪問數據 3,500 次的成本似乎比 30,000,000 值掃描（耶旋轉磁碟）慢。而純索引掃描顯然更好。

也許比我聰明的人會想出更好的答案？

解釋
我的問題是：為什麼這不使用索引amplifier_saturation_start？
即使使用30,000,000 rows, only 3,500 in the date range它可以更快地從索引頂部讀取元amplifier_saturation_lddate組lddate。通過過濾器的第一行start可以按原樣返回。無需排序步驟。對於完全隨機分佈，平均需要檢查略低於 9000 個索引元組。
使用，Postgres在獲取所有 3500 個合格行後amplifier_saturation_start仍需確定。max(lddate)千鈞一發。該決定取決於收集的統計數據和您的成本設置。根據數據分佈和其他細節，一種或另一種會更快，而另一種預計會更快。
更好的索引
使用多列索引可以大大加快速度，就像(lddate, start)您已經發現自己一樣。這樣，Postgres 可以使用僅索引掃描，而根本不接觸堆（表）。
但是還有另一件小事可以改進。您是否想知道輸出中的這個細節EXPLAIN？
Index Cond: ((lddate IS NOT NULL) AND ...
為什麼 Postgres 必須排除 NULL 值？
因為 NULL 按順序排列在中的最大值之後ASCENDING或之前DESCENDING。如果存在 NULL 值，則聚合函式返回的最大非 null 值max()不在索引的開頭/結尾。添加NULLS LAST | FIRST將排序順序調整為的特徵max()（並使相反的min()更昂貴）。由於我們最感興趣的是最新的時間戳，DESC NULLS LAST因此是更好的選擇。
CREATE INDEX tmp_as ON bss.amplifier_saturation (lddate **DESC NULLS LAST**, start);
現在，您的表列lddate顯然沒有 NULL 值，被定義NOT NULL。在這種特殊情況下，對性能的影響可以忽略不計。對於可以有 NULL的情況，仍然值得一提。
另一個索引選項是 on (start, lddate)，基本上是一個 pimpedamplifier_saturation_start索引，它也允許僅索引掃描。根據查詢中的數據分佈和實際參數值，一種或另一種會更快。
關於兩個注意事項timestamp
您的表列是timestamptz，但您的查詢謂詞使用timestamp文字。Postgres 從您目前的timezone設置中派生時區並進行相應調整。這可能符合也可能不符合預期。它肯定會使查詢變得不穩定- 取決於您的會話設置。對於可以從不同時區（具有不同會話設置）進行的呼叫來說，這將是有問題的。然後，您寧願使用顯式偏移或AT TIME ZONE構造使其穩定。細節：
在 Rails 和 PostgreSQL 中完全忽略時區
您通常希望排除正確性的上限。**<**而不是<=.
select max(lddate)
from   bss.amplifier_saturation
where  start &gt;= '1987-12-31 00:00:00'**::timestamp AT TIME ZONE 'PST'**
and    start **&lt;**  '1988-04-09 00:00:00 **PST'::timestamptz**; -- shorter
PST（太平洋標準時間）是一個隨機的範例時區。

引用自：https://dba.stackexchange.com/questions/90128

日期查詢範圍內未使用的索引

解釋

更好的索引

關於兩個注意事項`timestamp`

相關問答

大表的高效分頁

具有最新時間戳的行

索引：如果節點數相同，則整數與字元串的性能

理解“點陣圖堆掃描”和“序列掃描”

Postgres 不使用 group by 的表達式索引

當查詢有許多（超過 18 個）OR 語句時忽略 GIN 索引

日期查詢範圍內未使用的索引

解釋

更好的索引

關於兩個注意事項timestamp

相關問答

大表的高效分頁

具有最新時間戳的行

索引：如果節點數相同，則整數與字元串的性能

理解“點陣圖堆掃描”和“序列掃描”

Postgres 不使用 group by 的表達式索引

當查詢有許多（超過 18 個）OR 語句時忽略 GIN 索引

關於兩個注意事項`timestamp`