Postgresql
如果鍵不為空,Postgres 按鍵減少
我需要選擇 N 行,
seq > X
其中reduce_key is null
每個行組相同,reduce_key
選擇最大的行組seq
。結果應該是嚴格順序的(但reduce_key
除了最大的行之外,過濾的行相同seq
)並限制為 N。這是一個執行此操作的查詢:
explain analyze SELECT x2."user_id", x2."seq", x2."timestamp", x2."reduce_key", x2."mapping" FROM "user_sequence" x2 WHERE ( (x2."user_id" = 860862404) AND (x2."seq" > 3562974) ) AND ( (x2."reduce_key" IS NULL) OR ( x2."seq" IN ( SELECT MAX ("seq") FROM "user_sequence" WHERE ("user_id" = 860862404) AND ( ("seq" >= x2."seq") AND ("reduce_key" = x2."reduce_key") ) GROUP BY "reduce_key" ) ) ) ORDER BY x2."seq" LIMIT 1000
問題是它在具有數千萬行的表上執行約 70 毫秒。有沒有更有效的方法來通過鍵來減少行,使行帶有空鍵?
表結構:
actor=# \d user_sequence; Table "public.user_sequence" Column | Type | Modifiers ------------+---------+----------- user_id | integer | not null seq | integer | not null timestamp | bigint | not null mapping | bytea | not null reduce_key | text | Indexes: "user_sequence_pkey" PRIMARY KEY, btree (user_id, seq) "user_sequence_user_id_reduce_key_seq_idx" btree (user_id, reduce_key, seq)
EXPLAIN ANALYZE
:Limit (cost=0.57..231389.06 rows=1000 width=103) (actual time=0.408..74.754 rows=308 loops=1) -> Index Scan using user_sequence_pkey on user_sequence x2 (cost=0.57..24444343.74 rows=105642 width=103) (act ual time=0.406..74.645 rows=308 loops=1) Index Cond: ((user_id = 860862404) AND (seq > 3562974)) Filter: ((reduce_key IS NULL) OR (SubPlan 1)) Rows Removed by Filter: 692 SubPlan 1 -> GroupAggregate (cost=0.57..447.59 rows=1 width=33) (actual time=0.096..0.097 rows=1 loops=730) Group Key: user_sequence.reduce_key -> Index Only Scan using user_sequence_user_id_reduce_key_seq_idx on user_sequence (cost=0.57..4 47.03 rows=110 width=33) (actual time=0.036..0.081 rows=68 loops=730) Index Cond: ((user_id = 860862404) AND (reduce_key = x2.reduce_key) AND (seq >= x2.seq)) Heap Fetches: 49371 Planning time: 0.808 ms Execution time: 74.890 ms (13 rows)
我認為問題出在過濾器上,但如何使它使用索引?
我正在使用 PostgreSQL 9.4。
假設您想要目前查詢的功能(這似乎與您的描述不同)。
你問:
我認為問題出在過濾器上,但如何使它使用索引?
您的查詢計劃表明 Postgres 已經在每一步都使用索引。添加的
FILTER
步驟僅根據您的附加謂詞過濾行。起初我發布了一個不同的想法,但這是缺失的
ORDER BY seq
。這個使用簡單NOT EXISTS
反半連接的查詢應該是一個更好的等價物:SELECT user_id , seq , timestamp , reduce_key , mapping FROM user_sequence u WHERE user_id = 860862404 AND seq > 3562974 AND NOT EXISTS ( SELECT 1 FROM user_sequence WHERE user_id = u.user_id AND reduce_key = u.reduce_key AND seq > u.seq ) ORDER BY seq LIMIT 1000;
您現有的索引看起來很適合它。對於最後一項,第二個可能會更高效一些:
user_sequence_user_id_reduce_key_seq_idx btree (user_id, reduce_key, seq **DESC**)
通常,使用 會有一個更簡單的替代方案
DISTINCT ON
,但是當您想要所有行時,它會將所有行折疊reduce_key IS NULL
成一行。相關答案和更多解釋:
旁白:你有很多不必要的括號和雙引號。也許是一個低效的 ORM?