如果鍵不為空，Postgres 按鍵減少

February 18, 2016

我需要選擇 N 行，seq > X其中reduce_key is null每個行組相同，reduce_key選擇最大的行組seq。結果應該是嚴格順序的（但reduce_key除了最大的行之外，過濾的行相同seq）並限制為 N。

這是一個執行此操作的查詢：

explain analyze SELECT
   x2."user_id",
   x2."seq",
   x2."timestamp",
   x2."reduce_key",
   x2."mapping"
FROM
   "user_sequence" x2
WHERE
   (
       (x2."user_id" = 860862404)
       AND (x2."seq" &gt; 3562974)
   )
AND (
   (x2."reduce_key" IS NULL)
   OR (
       x2."seq" IN (
           SELECT
               MAX ("seq")
           FROM
               "user_sequence"
           WHERE
               ("user_id" = 860862404)
           AND (
               ("seq" &gt;= x2."seq")
               AND ("reduce_key" = x2."reduce_key")
           )
           GROUP BY
               "reduce_key" 
       )
   )
)
ORDER BY
   x2."seq"
LIMIT 1000

問題是它在具有數千萬行的表上執行約 70 毫秒。有沒有更有效的方法來通過鍵來減少行，使行帶有空鍵？

表結構：

actor=# \d user_sequence;
  Table "public.user_sequence"
  Column   |  Type   | Modifiers 
------------+---------+-----------
user_id    | integer | not null
seq        | integer | not null
timestamp  | bigint  | not null
mapping    | bytea   | not null
reduce_key | text    | 
Indexes:
   "user_sequence_pkey" PRIMARY KEY, btree (user_id, seq)
   "user_sequence_user_id_reduce_key_seq_idx" btree (user_id, reduce_key, seq)

EXPLAIN ANALYZE:

Limit  (cost=0.57..231389.06 rows=1000 width=103) (actual time=0.408..74.754 rows=308 loops=1)
  -&gt;  Index Scan using user_sequence_pkey on user_sequence x2  (cost=0.57..24444343.74 rows=105642 width=103) (act
ual time=0.406..74.645 rows=308 loops=1)
        Index Cond: ((user_id = 860862404) AND (seq &gt; 3562974))
        Filter: ((reduce_key IS NULL) OR (SubPlan 1))
        Rows Removed by Filter: 692
        SubPlan 1
          -&gt;  GroupAggregate  (cost=0.57..447.59 rows=1 width=33) (actual time=0.096..0.097 rows=1 loops=730)
                Group Key: user_sequence.reduce_key
                -&gt;  Index Only Scan using user_sequence_user_id_reduce_key_seq_idx on user_sequence  (cost=0.57..4
47.03 rows=110 width=33) (actual time=0.036..0.081 rows=68 loops=730)
                      Index Cond: ((user_id = 860862404) AND (reduce_key = x2.reduce_key) AND (seq &gt;= x2.seq))
                      Heap Fetches: 49371
Planning time: 0.808 ms
Execution time: 74.890 ms
(13 rows)

我認為問題出在過濾器上，但如何使它使用索引？

我正在使用 PostgreSQL 9.4。

假設您想要目前查詢的功能（這似乎與您的描述不同）。
你問：
我認為問題出在過濾器上，但如何使它使用索引？
您的查詢計劃表明 Postgres 已經在每一步都使用索引。添加的FILTER步驟僅根據您的附加謂詞過濾行。
起初我發布了一個不同的想法，但這是缺失的ORDER BY seq。這個使用簡單NOT EXISTS反半連接的查詢應該是一個更好的等價物：
SELECT user_id
    , seq
    , timestamp
    , reduce_key
    , mapping
FROM   user_sequence u
WHERE  user_id = 860862404
AND    seq &gt; 3562974
AND    NOT EXISTS (
  SELECT 1
  FROM   user_sequence
  WHERE  user_id = u.user_id
  AND    reduce_key = u.reduce_key
  AND    seq &gt; u.seq
  )
ORDER  BY seq
LIMIT  1000;
您現有的索引看起來很適合它。對於最後一項，第二個可能會更高效一些：
user_sequence_user_id_reduce_key_seq_idx btree (user_id, reduce_key, seq **DESC**)
通常，使用會有一個更簡單的替代方案DISTINCT ON，但是當您想要所有行時，它會將所有行折疊reduce_key IS NULL成一行。
相關答案和更多解釋：
如何有效地獲取“最近的對應行”？
旁白：你有很多不必要的括號和雙引號。也許是一個低效的 ORM？

引用自：https://dba.stackexchange.com/questions/127670

如果鍵不為空，Postgres 按鍵減少

相關問答

如何優化 Postgres 查詢中的排序

使用大 IN 優化 Postgres 查詢

大表中的慢速索引掃描

使用 GIN 索引位串

優化對 690,000 行表的昂貴的 GROUP BY / ORDER BY 查詢

刪除大量行的最佳方法，知道欄位不刪除