從大表中獲得最大價值的高效查詢

March 25, 2022

給定表格：
   Column    |            Type             
id           | integer                     
latitude     | numeric(9,6)                
longitude    | numeric(9,6)                
speed        | integer                     
equipment_id | integer                     
created_at   | timestamp without time zone
Indexes:
   "geoposition_records_pkey" PRIMARY KEY, btree (id)
該表有 2000 萬條記錄，相對而言，這並不是一個很大的數字。但這會使順序掃描變慢。
我怎樣才能得到max(created_at)每個的最後一條記錄（）equipment_id？
我已經嘗試了以下兩個查詢，其中有幾個變體，我已經閱讀了這個主題的許多答案：
select max(created_at),equipment_id from geoposition_records group by equipment_id;

select distinct on (equipment_id) equipment_id,created_at 
 from geoposition_records order by equipment_id, created_at desc;
我也嘗試過創建 btree 索引，equipment_id,created_at但 Postgres 發現使用 seqscan 更快。強制enable_seqscan = off也沒有用，因為讀取索引和 seq 掃描一樣慢，可能更糟。
查詢必須定期執行，始終返回最後一個。
使用 Postgres 9.3。
解釋/分析（有 170 萬條記錄）：
set enable_seqscan=true;
explain analyze select max(created_at),equipment_id from geoposition_records group by equipment_id;
"HashAggregate  (cost=47803.77..47804.34 rows=57 width=12) (actual time=1935.536..1935.556 rows=58 loops=1)"
"  -&gt;  Seq Scan on geoposition_records  (cost=0.00..39544.51 rows=1651851 width=12) (actual time=0.029..494.296 rows=1651851 loops=1)"
"Total runtime: 1935.632 ms"

set enable_seqscan=false;
explain analyze select max(created_at),equipment_id from geoposition_records group by equipment_id;
"GroupAggregate  (cost=0.00..2995933.57 rows=57 width=12) (actual time=222.034..11305.073 rows=58 loops=1)"
"  -&gt;  Index Scan using geoposition_records_equipment_id_created_at_idx on geoposition_records  (cost=0.00..2987673.75 rows=1651851 width=12) (actual time=0.062..10248.703 rows=1651851 loops=1)"
"Total runtime: 11305.161 ms"

指數
畢竟，一個普通的多列 B 樹索引應該可以工作：
CREATE INDEX foo_idx
ON geoposition_records (equipment_id, created_at DESC NULLS LAST);
為什麼DESC NULLS LAST？
日期查詢範圍內未使用的索引
假設你有一張**equipment桌子**是安全的嗎？那麼性能就不會有問題了：
相關子查詢
基於這個equipment表，執行一個低相關的子查詢，效果很好：
SELECT equipment_id
    , (SELECT created_at
       FROM   geoposition_records
       WHERE  equipment_id = eq.equipment_id
       ORDER  BY created_at DESC NULLS LAST
       LIMIT  1) AS latest
FROM   equipment eq;
對於表中的少量行（根據您的輸出equipment判斷為 57行），這非常快。EXPLAIN ANALYZE
LATERAL加入 Postgres 9.3+
SELECT eq.equipment_id, r.latest
FROM   equipment eq
LEFT   JOIN LATERAL (
  SELECT created_at
  FROM   geoposition_records
  WHERE  equipment_id = eq.equipment_id
  ORDER  BY created_at DESC NULLS LAST
  LIMIT  1
  ) r(latest) ON true;
詳細解釋：
優化 GROUP BY 查詢以檢索每個使用者的最新記錄
性能類似於相關子查詢。
功能
如果您無法與查詢計劃器對話（這不應該發生），那麼循環遍歷設備表的函式肯定可以解決問題。一次查找一個equipment_id使用索引。
CREATE OR REPLACE FUNCTION f_latest_equip()
 RETURNS TABLE (equipment_id int, latest timestamp)
 LANGUAGE plpgsql STABLE AS
$func$
BEGIN
  FOR equipment_id IN
     SELECT e.equipment_id FROM equipment e ORDER BY 1
  LOOP
     SELECT g.created_at
     FROM   geoposition_records g
     WHERE  g.equipment_id = f_latest_equip.equipment_id
                          -- prepend function name to disambiguate
     ORDER  BY g.created_at DESC NULLS LAST
     LIMIT  1
     INTO   latest;

     RETURN NEXT;
  END LOOP;
END  
$func$;
打個電話也不錯：
SELECT * FROM f_latest_equip();
性能對比：
db<>fiddle here
舊sqlfiddle

嘗試 1
如果
我有一張單獨的equipment桌子，並且
我有一個索引geoposition_records(equipment_id, created_at desc)
那麼以下對我有用：
select id as equipment_id, (select max(created_at)
                           from geoposition_records
                           where equipment_id = equipment.id
                          ) as max_created_at
from equipment;
我無法強制 PG 進行快速查詢以確定s 和相關的列表。但是我明天要再試一次！equipment_id``max(created_at)
嘗試 2
我找到了這個連結：http: //zogovic.com/post/44856908222/optimizing-postgresql-query-for-distinct-values 將此技術與嘗試 1 中的查詢相結合，我得到：
WITH RECURSIVE equipment(id) AS (
   SELECT MIN(equipment_id) FROM geoposition_records
 UNION
   SELECT (
     SELECT equipment_id
     FROM geoposition_records
     WHERE equipment_id &gt; equipment.id
     ORDER BY equipment_id
     LIMIT 1
   )
   FROM equipment WHERE id IS NOT NULL
)
SELECT id AS equipment_id, (SELECT MAX(created_at)
                           FROM geoposition_records
                           WHERE equipment_id = equipment.id
                          ) AS max_created_at
FROM equipment;
這工作得很快！但是你需要
這種超扭曲的查詢表格，以及
上的索引geoposition_records(equipment_id, created_at desc)。

引用自：https://dba.stackexchange.com/questions/51895

從大表中獲得最大價值的高效查詢

指數

相關子查詢

`LATERAL`加入 Postgres 9.3+

功能

性能對比：

相關問答

如何在PostgreSQL中儲存和查詢匹配前綴或後綴的字元串？

使用大 IN 優化 Postgres 查詢

大表中的慢速索引掃描

使用 GIN 索引位串

我們可以為 JSONB 數據類型的鍵/值創建索引嗎？

大表的高效分頁

從大表中獲得最大價值的高效查詢

指數

相關子查詢

LATERAL加入 Postgres 9.3+

功能

性能對比：

相關問答

如何在PostgreSQL中儲存和查詢匹配前綴或後綴的字元串？

使用大 IN 優化 Postgres 查詢

大表中的慢速索引掃描

使用 GIN 索引位串

我們可以為 JSONB 數據類型的鍵/值創建索引嗎？

大表的高效分頁

`LATERAL`加入 Postgres 9.3+