Postgresql

如果鍵不為空,Postgres 按鍵減少

  • February 18, 2016

我需要選擇 N 行,seq > X其中reduce_key is null每個行組相同,reduce_key選擇最大的行組seq。結果應該是嚴格順序的(但reduce_key除了最大的行之外,過濾的行相同seq)並限制為 N。

這是一個執行此操作的查詢:

explain analyze SELECT
   x2."user_id",
   x2."seq",
   x2."timestamp",
   x2."reduce_key",
   x2."mapping"
FROM
   "user_sequence" x2
WHERE
   (
       (x2."user_id" = 860862404)
       AND (x2."seq" > 3562974)
   )
AND (
   (x2."reduce_key" IS NULL)
   OR (
       x2."seq" IN (
           SELECT
               MAX ("seq")
           FROM
               "user_sequence"
           WHERE
               ("user_id" = 860862404)
           AND (
               ("seq" >= x2."seq")
               AND ("reduce_key" = x2."reduce_key")
           )
           GROUP BY
               "reduce_key" 
       )
   )
)
ORDER BY
   x2."seq"
LIMIT 1000

問題是它在具有數千萬行的表上執行約 70 毫秒。有沒有更有效的方法來通過鍵來減少行,使行帶有空鍵?

表結構:

actor=# \d user_sequence;
  Table "public.user_sequence"
  Column   |  Type   | Modifiers 
------------+---------+-----------
user_id    | integer | not null
seq        | integer | not null
timestamp  | bigint  | not null
mapping    | bytea   | not null
reduce_key | text    | 
Indexes:
   "user_sequence_pkey" PRIMARY KEY, btree (user_id, seq)
   "user_sequence_user_id_reduce_key_seq_idx" btree (user_id, reduce_key, seq)

EXPLAIN ANALYZE:

Limit  (cost=0.57..231389.06 rows=1000 width=103) (actual time=0.408..74.754 rows=308 loops=1)
  ->  Index Scan using user_sequence_pkey on user_sequence x2  (cost=0.57..24444343.74 rows=105642 width=103) (act
ual time=0.406..74.645 rows=308 loops=1)
        Index Cond: ((user_id = 860862404) AND (seq > 3562974))
        Filter: ((reduce_key IS NULL) OR (SubPlan 1))
        Rows Removed by Filter: 692
        SubPlan 1
          ->  GroupAggregate  (cost=0.57..447.59 rows=1 width=33) (actual time=0.096..0.097 rows=1 loops=730)
                Group Key: user_sequence.reduce_key
                ->  Index Only Scan using user_sequence_user_id_reduce_key_seq_idx on user_sequence  (cost=0.57..4
47.03 rows=110 width=33) (actual time=0.036..0.081 rows=68 loops=730)
                      Index Cond: ((user_id = 860862404) AND (reduce_key = x2.reduce_key) AND (seq >= x2.seq))
                      Heap Fetches: 49371
Planning time: 0.808 ms
Execution time: 74.890 ms
(13 rows)

我認為問題出在過濾器上,但如何使它使用索引?

我正在使用 PostgreSQL 9.4。

假設您想要目前查詢的功能(這似乎與您的描述不同)。

你問:

我認為問題出在過濾器上,但如何使它使用索引?

您的查詢計劃表明 Postgres 已經在每一步都使用索引。添加的FILTER步驟僅根據您的附加謂詞過濾行。

起初我發布了一個不同的想法,但這是缺失的ORDER BY seq。這個使用簡單NOT EXISTS反半連接的查詢應該是一個更好的等價物:

SELECT user_id
    , seq
    , timestamp
    , reduce_key
    , mapping
FROM   user_sequence u
WHERE  user_id = 860862404
AND    seq > 3562974
AND    NOT EXISTS (
  SELECT 1
  FROM   user_sequence
  WHERE  user_id = u.user_id
  AND    reduce_key = u.reduce_key
  AND    seq > u.seq
  )
ORDER  BY seq
LIMIT  1000;

您現有的索引看起來很適合它。對於最後一項,第二個可能會更高效一些:

user_sequence_user_id_reduce_key_seq_idx btree (user_id, reduce_key, seq **DESC**)

通常,使用 會有一個更簡單的替代方案DISTINCT ON,但是當您想要所有行時,它會將所有行折疊reduce_key IS NULL成一行。

相關答案和更多解釋:

旁白:你有很多不必要的括號和雙引號。也許是一個低效的 ORM?

引用自:https://dba.stackexchange.com/questions/127670