Postgresql
Postgres 正在執行順序掃描而不是索引掃描
我有一個包含大約 1000 萬行的表和一個日期欄位的索引。當我嘗試提取索引欄位的唯一值時,即使結果集只有 26 個項目,Postgres 也會執行順序掃描。為什麼優化師選擇這個計劃?我能做些什麼來避免它?
從其他答案中,我懷疑這與查詢和索引一樣多。
explain select "labelDate" from pages group by "labelDate"; QUERY PLAN ----------------------------------------------------------------------- HashAggregate (cost=524616.78..524617.04 rows=26 width=4) Group Key: "labelDate" -> Seq Scan on pages (cost=0.00..499082.42 rows=10213742 width=4) (3 rows)
表結構:
http=# \d pages Table "public.pages" Column | Type | Modifiers -----------------+------------------------+---------------------------------- pageid | integer | not null default nextval('... createDate | integer | not null archive | character varying(16) | not null label | character varying(32) | not null wptid | character varying(64) | not null wptrun | integer | not null url | text | urlShort | character varying(255) | startedDateTime | integer | renderStart | integer | onContentLoaded | integer | onLoad | integer | PageSpeed | integer | rank | integer | reqTotal | integer | not null reqHTML | integer | not null reqJS | integer | not null reqCSS | integer | not null reqImg | integer | not null reqFlash | integer | not null reqJSON | integer | not null reqOther | integer | not null bytesTotal | integer | not null bytesHTML | integer | not null bytesJS | integer | not null bytesCSS | integer | not null bytesHTML | integer | not null bytesJS | integer | not null bytesCSS | integer | not null bytesImg | integer | not null bytesFlash | integer | not null bytesJSON | integer | not null bytesOther | integer | not null numDomains | integer | not null labelDate | date | TTFB | integer | reqGIF | smallint | not null reqJPG | smallint | not null reqPNG | smallint | not null reqFont | smallint | not null bytesGIF | integer | not null bytesJPG | integer | not null bytesPNG | integer | not null bytesFont | integer | not null maxageMore | smallint | not null maxage365 | smallint | not null maxage30 | smallint | not null maxage1 | smallint | not null maxage0 | smallint | not null maxageNull | smallint | not null numDomElements | integer | not null numCompressed | smallint | not null numHTTPS | smallint | not null numGlibs | smallint | not null numErrors | smallint | not null numRedirects | smallint | not null maxDomainReqs | smallint | not null bytesHTMLDoc | integer | not null maxage365 | smallint | not null maxage30 | smallint | not null maxage1 | smallint | not null maxage0 | smallint | not null maxageNull | smallint | not null numDomElements | integer | not null numCompressed | smallint | not null numHTTPS | smallint | not null numGlibs | smallint | not null numErrors | smallint | not null numRedirects | smallint | not null maxDomainReqs | smallint | not null bytesHTMLDoc | integer | not null fullyLoaded | integer | cdn | character varying(64) | SpeedIndex | integer | visualComplete | integer | gzipTotal | integer | not null gzipSavings | integer | not null siteid | numeric | Indexes: "pages_pkey" PRIMARY KEY, btree (pageid) "pages_date_url" UNIQUE CONSTRAINT, btree ("urlShort", "labelDate") "idx_pages_cdn" btree (cdn) "idx_pages_labeldate" btree ("labelDate") CLUSTER "idx_pages_urlshort" btree ("urlShort") Triggers: pages_label_date BEFORE INSERT OR UPDATE ON pages FOR EACH ROW EXECUTE PROCEDURE fix_label_date()
這是有關 Postgres 優化的已知問題。如果不同的值很少 - 就像你的情況一樣 - 並且你在 8.4+ 版本中,這裡描述了一個使用遞歸查詢的非常快速的解決方法:Loose Indexscan。
您的查詢可以重寫(
LATERAL
需要 9.3+ 版本):WITH RECURSIVE pa AS ( ( SELECT labelDate FROM pages ORDER BY labelDate LIMIT 1 ) UNION ALL SELECT n.labelDate FROM pa AS p , LATERAL ( SELECT labelDate FROM pages WHERE labelDate > p.labelDate ORDER BY labelDate LIMIT 1 ) AS n ) SELECT labelDate FROM pa ;
Erwin Brandstetter 在這個答案中有詳盡的解釋和查詢的幾個變體(在一個相關但不同的問題上):優化 GROUP BY 查詢以檢索每個使用者的最新記錄