使用 LIMIT 緩慢 ORDER BY

June 23, 2012

我有這個查詢：

SELECT * 
FROM location 
WHERE to_tsvector('simple',unaccent2("city"))
  @@ to_tsquery('simple',unaccent2('wroclaw')) 
order by displaycount

我很高興：

"Sort  (cost=3842.56..3847.12 rows=1826 width=123) (actual time=1.915..2.084 rows=1307 loops=1)"
"  Sort Key: displaycount"
"  Sort Method: quicksort  Memory: 206kB"
"  -&gt;  Bitmap Heap Scan on location  (cost=34.40..3743.64 rows=1826 width=123) (actual time=0.788..1.208 rows=1307 loops=1)"
"        Recheck Cond: (to_tsvector('simple'::regconfig, unaccent2((city)::text)) @@ '''wroclaw'''::tsquery)"
"        -&gt;  Bitmap Index Scan on location_lower_idx  (cost=0.00..33.95 rows=1826 width=0) (actual time=0.760..0.760 rows=1307 loops=1)"
"              Index Cond: (to_tsvector('simple'::regconfig, unaccent2((city)::text)) @@ '''wroclaw'''::tsquery)"
"Total runtime: 2.412 ms"

但是當我添加 LIMIT 時，執行需要超過 2 秒：

SELECT * 
FROM location 
WHERE to_tsvector('simple',unaccent2("city"))
  @@ to_tsquery('simple',unaccent2('wroclaw')) 
order by displaycount 
limit 20

解釋：

"Limit  (cost=0.00..1167.59 rows=20 width=123) (actual time=2775.452..2775.643 rows=20 loops=1)"
"  -&gt;  Index Scan using location_displaycount_index on location  (cost=0.00..106601.25 rows=1826 width=123) (actual time=2775.448..2775.637 rows=20 loops=1)"
"        Filter: (to_tsvector('simple'::regconfig, unaccent2((city)::text)) @@ '''wroclaw'''::tsquery)"
"Total runtime: 2775.693 ms"

我認為這是 ORDER BY 和 LIMIT 的一些問題。如何強制 PostgreSQL 使用索引並在最後進行排序？

子查詢沒有幫助：

SELECT * 
FROM (
   SELECT * 
   FROM location 
   WHERE to_tsvector('simple',unaccent2("city"))
      @@ to_tsquery('simple',unaccent2('wroclaw')) 
   order by displaycount
) t 
LIMIT 20;

要麼：

SELECT * 
FROM (
   SELECT * 
   FROM location 
   WHERE to_tsvector('simple',unaccent2("city"))
      @@ to_tsquery('simple',unaccent2('wroclaw'))
) t 
order by displaycount 
LIMIT 20;

我的猜測是，這將解決您的查詢：
SELECT * 
FROM   location 
WHERE     to_tsvector('simple',unaccent2(city))
      @@ to_tsquery('simple',unaccent2('wroclaw')) 
ORDER  BY to_tsvector('simple',unaccent2(city))
      @@ to_tsquery('simple',unaccent2('wroclaw')) DESC
        ,displaycount 
LIMIT  20;
我將WHERE條件作為ORDER BY子句的第一個元素重複 - 這在邏輯上是多餘的，但應該讓查詢計劃器不要假設根據索引處理行會更好location_displaycount_index- 結果會更加昂貴。
潛在的問題是查詢計劃器顯然嚴重錯誤地判斷了您的WHERE條件的選擇性和/或成本。我只能推測這是為什麼。
您是否正在執行autovacuum - 它也應該負責ANALYZE在您的桌子上執行？因此，您的表格統計資訊是最新的嗎？如果您執行任何效果：
ANALYZE location;
然後再試一次？
也可能是@@操作者的選擇性被誤判了。我想由於邏輯原因很難估計。
如果我的查詢不應該解決問題，並且通常要驗證基礎理論，請執行以下兩件事之一：
暫時刪除索引location_displaycount_index
通過執行暫時禁用基本索引掃描：
SET enable_indexscan = OFF;
後者侵入性較小，僅影響目前會話。它保留方法bitmap heap scan並bitmap index scan打開，由更快的計劃使用。
然後重新執行查詢。
順便說一句：如果理論是合理的，那麼您的查詢（就像您現在所擁有的那樣）將在 FTS 條件下使用較少選擇性的搜尋詞來更快 - 與您的預期相反。試試吧。

引用自：https://dba.stackexchange.com/questions/19726

使用 LIMIT 緩慢 ORDER BY

相關問答

如何在PostgreSQL中儲存和查詢匹配前綴或後綴的字元串？

在多個文本欄位上進行模式匹配的更快查詢

慢速全文搜尋出現率高的術語

使用大 IN 優化 Postgres 查詢

大表中的慢速索引掃描

從大表中獲得最大價值的高效查詢