Postgresql

Postgres 正在執行順序掃描而不是索引掃描

  • April 7, 2019

我有一個包含大約 1000 萬行的表和一個日期欄位的索引。當我嘗試提取索引欄位的唯一值時,即使結果集只有 26 個項目,Postgres 也會執行順序掃描。為什麼優化師選擇這個計劃?我能做些什麼來避免它?

從其他答案中,我懷疑這與查詢和索引一樣多。

explain select "labelDate" from pages group by "labelDate";
                             QUERY PLAN
-----------------------------------------------------------------------
HashAggregate  (cost=524616.78..524617.04 rows=26 width=4)
  Group Key: "labelDate"
  ->  Seq Scan on pages  (cost=0.00..499082.42 rows=10213742 width=4)
(3 rows)

表結構:

http=# \d pages
                                      Table "public.pages"
    Column      |          Type          |        Modifiers
-----------------+------------------------+----------------------------------
pageid          | integer                | not null default nextval('...
createDate      | integer                | not null
archive         | character varying(16)  | not null
label           | character varying(32)  | not null
wptid           | character varying(64)  | not null
wptrun          | integer                | not null
url             | text                   |
urlShort        | character varying(255) |
startedDateTime | integer                |
renderStart     | integer                |
onContentLoaded | integer                |
onLoad          | integer                |
PageSpeed       | integer                |
rank            | integer                |
reqTotal        | integer                | not null
reqHTML         | integer                | not null
reqJS           | integer                | not null
reqCSS          | integer                | not null
reqImg          | integer                | not null
reqFlash        | integer                | not null
reqJSON         | integer                | not null
reqOther        | integer                | not null
bytesTotal      | integer                | not null
bytesHTML       | integer                | not null
bytesJS         | integer                | not null
bytesCSS        | integer                | not null
bytesHTML       | integer                | not null
bytesJS         | integer                | not null
bytesCSS        | integer                | not null
bytesImg        | integer                | not null
bytesFlash      | integer                | not null
bytesJSON       | integer                | not null
bytesOther      | integer                | not null
numDomains      | integer                | not null
labelDate       | date                   |
TTFB            | integer                |
reqGIF          | smallint               | not null
reqJPG          | smallint               | not null
reqPNG          | smallint               | not null
reqFont         | smallint               | not null
bytesGIF        | integer                | not null
bytesJPG        | integer                | not null
bytesPNG        | integer                | not null
bytesFont       | integer                | not null
maxageMore      | smallint               | not null
maxage365       | smallint               | not null
maxage30        | smallint               | not null
maxage1         | smallint               | not null
maxage0         | smallint               | not null
maxageNull      | smallint               | not null
numDomElements  | integer                | not null
numCompressed   | smallint               | not null
numHTTPS        | smallint               | not null
numGlibs        | smallint               | not null
numErrors       | smallint               | not null
numRedirects    | smallint               | not null
maxDomainReqs   | smallint               | not null
bytesHTMLDoc    | integer                | not null
maxage365       | smallint               | not null
maxage30        | smallint               | not null
maxage1         | smallint               | not null
maxage0         | smallint               | not null
maxageNull      | smallint               | not null
numDomElements  | integer                | not null
numCompressed   | smallint               | not null
numHTTPS        | smallint               | not null
numGlibs        | smallint               | not null
numErrors       | smallint               | not null
numRedirects    | smallint               | not null
maxDomainReqs   | smallint               | not null
bytesHTMLDoc    | integer                | not null
fullyLoaded     | integer                |
cdn             | character varying(64)  |
SpeedIndex      | integer                |
visualComplete  | integer                |
gzipTotal       | integer                | not null
gzipSavings     | integer                | not null
siteid          | numeric                |
Indexes:
   "pages_pkey" PRIMARY KEY, btree (pageid)
   "pages_date_url" UNIQUE CONSTRAINT, btree ("urlShort", "labelDate")
   "idx_pages_cdn" btree (cdn)
   "idx_pages_labeldate" btree ("labelDate") CLUSTER
   "idx_pages_urlshort" btree ("urlShort")
Triggers:
   pages_label_date BEFORE INSERT OR UPDATE ON pages
     FOR EACH ROW EXECUTE PROCEDURE fix_label_date()

這是有關 Postgres 優化的已知問題。如果不同的值很少 - 就像你的情況一樣 - 並且你在 8.4+ 版本中,這裡描述了一個使用遞歸查詢的非常快速的解決方法:Loose Indexscan

您的查詢可以重寫(LATERAL需要 9.3+ 版本):

WITH RECURSIVE pa AS 
( ( SELECT labelDate FROM pages ORDER BY labelDate LIMIT 1 ) 
 UNION ALL
   SELECT n.labelDate 
   FROM pa AS p
        , LATERAL 
             ( SELECT labelDate 
               FROM pages 
               WHERE labelDate > p.labelDate 
               ORDER BY labelDate 
               LIMIT 1
             ) AS n
) 
SELECT labelDate 
FROM pa ;

Erwin Brandstetter 在這個答案中有詳盡的解釋和查詢的幾個變體(在一個相關但不同的問題上):優化 GROUP BY 查詢以檢索每個使用者的最新記錄

引用自:https://dba.stackexchange.com/questions/105537