Postgresql

Postgres 帶有 order by、index 和 limit 的慢查詢

  • November 7, 2017

我正在嘗試提高 postgres(9.6) 查詢的性能。這是我的架構,表包含大約 6000 萬行。

         Column          |            Type             | Modifiers
--------------------------+-----------------------------+-----------
transaction_id           | text                        | not null
network_merchant_name    | text                        |
network_merchant_id      | text                        |
network_merchant_mcc     | integer                     |
network_merchant_country | text                        |
issuer_country           | text                        |
merchant_id              | text                        |
remapped_merchant_id     | text                        |
created_at               | timestamp without time zone |
updated_at               | timestamp without time zone |
remapped_at              | timestamp without time zone |
Indexes:
   "mapped_transactions_pkey" PRIMARY KEY, btree (transaction_id)
   "ix_mapped_transactions_remapped_at" btree (remapped_at NULLS FIRST)

這是我要執行的查詢。

SELECT *
FROM mapped_transactions
ORDER BY remapped_at ASC NULLS FIRST
LIMIT 10000;

這是查詢計劃:

   QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.57..1511.67 rows=10000 width=146) (actual time=327049.374..327345.341 rows=10000 loops=1)
  Buffers: shared hit=574937 read=210425 dirtied=356 written=4457
  I/O Timings: read=146625.381 write=59.637
  ->  Index Scan using ix_mapped_transactions_remapped_at on mapped_transactions  (cost=0.57..16190862.91 rows=107145960 width=146) (actual time=327049.364..327339.402 rows=10000 loops=1)
        Buffers: shared hit=574937 read=210425 dirtied=356 written=4457
        I/O Timings: read=146625.381 write=59.637
Planning time: 0.125 ms
Execution time: 327348.322 ms
(8 rows)  

我不明白為什麼remapped_at列上有索引需要這麼多時間。

另一方面,如果我反向訂購,它會很快。

SELECT *
FROM mapped_transactions
ORDER BY remapped_at DESC NULLS LAST
LIMIT 10000;

計劃是:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.57..1511.67 rows=10000 width=146) (actual time=0.020..9.268 rows=10000 loops=1)
  Buffers: shared hit=1802
  ->  Index Scan Backward using ix_mapped_transactions_remapped_at on mapped_transactions  (cost=0.57..16190848.04 rows=107145866 width=146) (actual time=0.018..4.759 rows=10000 loops=1)
        Buffers: shared hit=1802
Planning time: 0.080 ms
Execution time: 11.561 ms
(6 rows)

有人可以幫助我如何提高第一個查詢的性能嗎?

更新

  1. 我通過重建表和重新索引數據解決了這個問題。VACUUM FULL ANALYZE不是一個選項,因為該表正在被使用,我不想鎖定它。
  2. 該指數的表現正在迅速惡化。我在 7 小時前重建了索引,性能還不錯。現在查詢在大約 10 秒內得到答复。請注意,此表寫得很重。如何使索引快速?我是否必須經常重新索引表?此表中沒有刪除,但有很多更新。

我又進行了一些測試,似乎這個解決方案對我不起作用。隨著時間的推移,索引會不斷降級,如果我不重新索引(或刪除並再次創建),我想要執行的查詢將變得越來越慢。有一些方法可以在不阻塞其他讀取/寫入的情況下創建索引,但目前我不會使用它們,因為它們不容易擴展。我無法用我所擁有的知識解決這個問題,所以我將採用另一種方​​法。這種方法使用具有恆定列且永遠不會更新的索引。感謝大家幫助我。

我無法確認 PostgreSQL 9.5。所以這可能是一種回歸。樣本數據,

CREATE TABLE mapped_transactions(remapped_at)
AS
 SELECT CASE WHEN x::int % 42 = 0 THEN null ELSE x::int END
 FROM generate_series( 1, 40e6 )
   AS gs(x);

CREATE INDEX ON mapped_transactions (remapped_at NULLS FIRST);
VACUUM ANALYZE mapped_transactions;

我們在這裡做NULLS FIRST,它有效,

EXPLAIN ANALYZE
SELECT *
FROM mapped_transactions
ORDER BY remapped_at ASC NULLS FIRST LIMIT 10000;
                                                                                    QUERY PLAN                                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.56..260.24 rows=10000 width=4) (actual time=0.069..4.308 rows=10000 loops=1)
  ->  Index Only Scan using mapped_transactions_remapped_at_idx on mapped_transactions  (cost=0.56..1038716.81 rows=40000016 width=4) (actual time=0.067..2.740 rows=10000 loops=1)
        Heap Fetches: 0
Planning time: 0.232 ms
Execution time: 5.017 ms
(5 rows)

DESC NULLS LAST

EXPLAIN ANALYZE
SELECT * FROM mapped_transactions
ORDER BY remapped_at DESC NULLS LAST LIMIT 10000;
                                                                                         QUERY PLAN                                                                                          
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.56..260.24 rows=10000 width=4) (actual time=0.073..4.429 rows=10000 loops=1)
  ->  Index Only Scan Backward using mapped_transactions_remapped_at_idx on mapped_transactions  (cost=0.56..1038716.81 rows=40000016 width=4) (actual time=0.071..2.865 rows=10000 loops=1)
        Heap Fetches: 0
Planning time: 0.114 ms
Execution time: 5.137 ms
(5 rows)

您可能需要編寫一個測試案例。不清楚你正在經歷什麼。

引用自:https://dba.stackexchange.com/questions/190111