如何索引/優化 PostgreSQL 中兩列的總和？

April 11, 2019

我們有一個數據庫，其中一張表包含需要保存不同時間（通常在幾十分鐘到兩週之間）的序列化臨時數據。我們還有一個低優先級的後台程序，可以從表中刪除舊行。後台程序在一個事務中最多刪除 1000 行：
delete from temporarydata
   where id in (
       select id from temporarydata
           where (created + ttl) &lt;= 1553755330 limit 1000
   )
1553755330範例中的是自 UNIX 紀元以來的目前秒數，包含created自 UNIX 紀元以來添加數據ttl的秒數，並且包含數據應保持活動狀態的秒數。
從技術上講，這確實有效，但臨時數據中有大約 2M 行，並且子選擇變得非常慢，因為總和需要對錶進行順序掃描以找到所有匹配的行。這會導致數據庫上的額外後台負載。
> explain (analyze,verbose,timing,buffers) select id from temporarydata
       where (created + ttl) &lt;= 1553755330 limit 1000

Limit  (cost=0.00..402.34 rows=1000 width=16) (actual time=6735.811..6735.811 rows=0 loops=1)
 Output: id
 Buffers: shared hit=3068 read=230500
 -&gt;  Seq Scan on public.temporarydata  (cost=0.00..262980.99 rows=653622 width=16) (actual time=6735.809..6735.809 rows=0 loops=1)
       Output: id
       Filter: ((temporarydata.created + temporarydata.ttl) &lt;= 1553755330)
       Rows Removed by Filter: 1916405
       Buffers: shared hit=3068 read=230500
Planning time: 0.402 ms
Execution time: 6735.849 ms
我寧願只添加一個新索引，該索引始終包含created + ttlPostgreSQL 能夠自動用於此查詢的總和。這可以通過高性能實現嗎？
（我正在考慮重寫應用程式碼來保存created而expires不是ttlwhere expiresis created+ ttl。然後我計算邏輯ttl作為這些值的差異。我認為應用程序不會ttl單獨發出繁重的查詢。）

我認為重構表以儲存過期將是一個好主意。如果您不想這樣做，那麼您可以製作一個表達式 index on temporarydata ((created + ttl))。
然而，讓它使用這個索引可能需要一些鼓勵，因為統計系統可能無法自然地為它提供足夠好的估計。將 ORDER BY 添加到您的子選擇應該會提供這種鼓勵：
where (created + ttl) &lt;= 1553755330 order by (created + ttl) limit 1000
（另外，你想先刪除最過期的似乎是有道理的。事實上，我不知道你為什麼想要 LIMIT 。）

引用自：https://dba.stackexchange.com/questions/234521

如何索引/優化 PostgreSQL 中兩列的總和？

相關問答

具有最新時間戳的行

索引：如果節點數相同，則整數與字元串的性能

Postgres - 使用索引插入性能

使用 GIN 索引位串

大表的高效分頁

如何使 MySQL 在整數範圍選擇查詢中使用索引