指定 JOIN 和 LIMIT 時 SELECT 非常慢,但找不到/很少找到結果
我遇到了一個問題,即查詢計劃器沒有考慮 JOIN 如何影響特定值的頻率。
對於某些情況:我將會話儲存在僅附加日誌中,其中每個會話將有多個條目(包含會話的所有數據)每個更新一個。
create table session( update_id bigint not null constraint "PK_gameSession_update" primary key, event_id bigint not null, event_type varchar(255) not null, time_in_millis timestamp with time zone not null, game_code varchar(100) not null, account_id bigint, session_id bigint not null, game_id bigint not null, session_client_ref varchar(255) not null, session_external_ref varchar(255) not null, player_external_ref varchar(255), community_id bigint not null, community_code varchar(255) not null, device text, type varchar(50) not null, country_code varchar(10), game_after_state jsonb, accept_language char(5) not null, currency varchar(255), status varchar(50) not null, provider_metadata jsonb, operator_code varchar(255), jurisdiction varchar(255), pending_round_id bigint, provider_managed_pending_round boolean, time_started timestamp with time zone not null, last_time_updated timestamp with time zone not null, action varchar(30) not null, seconds_to_idle bigint, provider_maximum_concurrent_sessions_per_game integer, referrer text ); create index "IDX_session_accept_language" on session (accept_language); create index "IDX_session_account_id" on session (account_id); create index "IDX_session_action" on session (action); create index "IDX_session_community_code" on session (community_code); create index "IDX_session_community_id" on session (community_id); create index "IDX_session_country_code" on session (country_code); create index "IDX_session_currency" on session (currency); create index "IDX_session_game_after_state" on session (game_after_state); create index "IDX_session_game_code" on session (game_code); create index "IDX_session_game_id" on session (game_id); create index "IDX_session_jurisdiction" on session (jurisdiction); create index "IDX_session_last_time_updated" on session (last_time_updated); create index "IDX_session_max_concurrent_sessions_per_game" on session (provider_maximum_concurrent_sessions_per_game); create index "IDX_session_operator_code" on session (operator_code); create index "IDX_session_pending_round_id" on session (pending_round_id); create index "IDX_session_player_external_ref" on session (player_external_ref); create index "IDX_session_provider_managed_pending_round" on session (provider_managed_pending_round); create index "IDX_session_provider_metadata" on session (provider_metadata); create index "IDX_session_referrer" on session (referrer); create index "IDX_session_seconds_to_idle" on session (seconds_to_idle); create index "IDX_session_session_client_ref" on session (session_client_ref); create index "IDX_session_session_external_ref" on session (session_external_ref); create index "IDX_session_status" on session (status); create index "IDX_session_time_started" on session (time_started); create index "IDX_session_type" on session (type); create index "IDX_session_update_community_id" on session (update_id, community_id); create index "IDX_session_update_partial_status_started" on session (update_id) where ((status) :: text = 'STARTED' :: text); create index "IDX_session_session_id_player_external_ref" on session (session_id, player_external_ref); create index "IDX_session_session_id" on session (session_id);
為了跟踪“最新”資訊,我有第二張表來跟踪這一點。
create table sessionlatest ( session_id bigint not null constraint "PK_session_id_latest" primary key, update_id bigint not null, event_id bigint not null, time_in_millis timestamp with time zone not null ); create index "IDX_sessionlatest_update_id" on sessionlatest (update_id);
範例 給定一個已創建的會話,然後更新兩次並最終完成,將生成以下數據,其中 update_id 顯示事件更改發生的順序:
會話表
+-----------+------------+-----------+ | update_id | session_id | status | +-----------+------------+-----------+ | 1 | 1 | STARTED | | 2 | 1 | STARTED | | 3 | 1 | STARTED | | 4 | 1 | COMPLETED | +-----------+------------+-----------+
見最新表:
+------------+-----------+--+ | session_id | update_id | | +------------+-----------+--+ | 1 | 4 | | +------------+-----------+--+
這樣我就知道如果我查找會話 1 的 update_id 4,那將是會話的最新(正確)視圖。但我仍然可以查找它的變化歷史。
基數
會話中約 500 萬行
會話中約 97% 的行狀態為“已啟動”
sessionlatest 中約 50K 行
~10-50 行加入會話和 sessionlatest(即大多數會話不再啟動)
我的目標是獲取目前“已啟動”的會話列表,即它們的最新更新狀態為“已啟動”
Postgres 版本 9.6.10
查詢
問題查詢:
EXPLAIN ANALYSE SELECT * FROM session gs join "public"."sessionlatest" gsl on gs."update_id" = gsl."update_id" WHERE gs."status" = 'STARTED' order by gs."session_id" desc LIMIT 500; Limit (cost=0.85..37681.96 rows=500 width=909) (actual time=0.137..45960.666 rows=29 loops=1) -> Nested Loop (cost=0.85..2996779.86 rows=39765 width=909) (actual time=0.135..45960.644 rows=29 loops=1) -> Index Scan Backward using "IDX_session_session_id" on session gs (cost=0.43..808987.56 rows=4884873 width=869) (actual time=0.038..32928.897 rows=4848024 loops=1) Filter: ((status)::text = 'STARTED'::text) Rows Removed by Filter: 115172 -> Index Scan using "IDX_sessionlatest_update_id" on sessionlatest gsl (cost=0.41..0.44 rows=1 width=32) (actual time=0.002..0.002 rows=0 loops=4848024) Index Cond: (update_id = gs.update_id) Planning time: 1.108 ms Execution time: 45960.839 ms
無限制查詢:
EXPLAIN ANALYSE SELECT * FROM session gs join "public"."sessionlatest" gsl on gs."update_id" = gsl."update_id" WHERE gs."status" = 'STARTED' order by gs."session_id" desc; Sort (cost=249561.70..249661.11 rows=39765 width=909) (actual time=145.306..145.317 rows=28 loops=1) Sort Key: gs.session_id DESC Sort Method: quicksort Memory: 55kB -> Nested Loop (cost=0.43..230619.81 rows=39765 width=909) (actual time=69.713..145.205 rows=28 loops=1) -> Seq Scan on sessionlatest gsl (cost=0.00..743.23 rows=40723 width=32) (actual time=0.010..5.386 rows=40732 loops=1) -> Index Scan using "IDX_session_update_partial_status_started" on session gs (cost=0.43..5.63 rows=1 width=869) (actual time=0.003..0.003 rows=0 loops=40732) Index Cond: (update_id = gsl.update_id) Planning time: 0.955 ms Execution time: 145.431 ms
無序查詢:
EXPLAIN ANALYSE SELECT * FROM session gs join "public"."sessionlatest" gsl on gs."update_id" = gsl."update_id" WHERE gs."status" = 'STARTED' LIMIT 500; Limit (cost=0.43..2900.21 rows=500 width=901) (actual time=70.971..149.729 rows=28 loops=1) -> Nested Loop (cost=0.43..230619.81 rows=39765 width=901) (actual time=70.970..149.721 rows=28 loops=1) -> Seq Scan on sessionlatest gsl (cost=0.00..743.23 rows=40723 width=32) (actual time=0.011..5.655 rows=40732 loops=1) -> Index Scan using "IDX_session_update_partial_status_started" on session gs (cost=0.43..5.63 rows=1 width=869) (actual time=0.003..0.003 rows=0 loops=40732) Index Cond: (update_id = gsl.update_id) Planning time: 0.874 ms Execution time: 149.820 ms
在我的案例中,我需要排序和限制。經過一些研究,我認為由於狀態“已啟動”非常常見,因此查詢計劃程序更喜歡在有限制的情況下進行順序掃描(因為它假設它將滿足前幾行的限制)。這是不正確的,因為在加入 sessionlatest 表後,永遠不會滿足限制。
我嘗試添加多列索引,但它被忽略了。
CREATE INDEX idx_session_id_status ON public.session(session_id, status);
我也嘗試了部分索引,但性能改進很小。
CREATE INDEX idx_session_id_partial_status ON public.session(session_id) WHERE public.session.status='STARTED'; Limit (cost=0.72..34319.23 rows=500 width=903) -> Nested Loop (cost=0.72..1145415.36 rows=16688 width=903) -> Index Scan Backward using idx_session_id_partial_status on session gs (cost=0.43..435253.15 rows=2252261 width=863) -> Index Scan using "IDX_Sessionlatest_update_id" on sessionlatest gsl (cost=0.29..0.31 rows=1 width=32) Index Cond: (update_id = gs.update_id)
最後我看到一個類似的問題SELECT very slow when no results and a LIMIT is specified主要區別是我的極端情況是由連接而不是多個過濾器創建的(我顯然不能在不同的表中創建索引)
您可以嘗試幾件事:
- 添加
status
入sessionlatest
表。由於此表就像最新會話的“記憶體”,因此這樣做絕對有意義(@jjanus 建議)。所有這些WHERE - ORDER BY - LIMIT
都涉及來自同一個(和更小的)表的列,因此可以有效地使用合適的索引(然後連接到大表只需找到相關的 500 行)。- 我不明白你為什麼要加入
update_id
而不是session_id
。這對執行計劃沒有很好的影響。您想要ORDER BY
並且LIMIT
通過,session_id
但連接在另一列上。也許您這樣做是為了在每個會話中獲得一個條目。可以重寫查詢以session_id
替代使用。- 我會嘗試
LATERAL JOIN
從較小的桌子開始使用。看到這個(使用
update_id
或session_id
兩者):select gs.*, gsl.* from sessionlatest gsl join lateral ( select s.* from session s where s.session_id = gsl.session_id and s.update_id = gsl.update_id order by s.update_id desc LIMIT 1 ) as gs on gs.status = 'STARTED' order by gsl.session_id desc limit 500 ;
我認為如果你有一個索引,上面會產生一個更好的執行計劃,
session (session_id, update_id)
但是如果你用你的表進行測試會更好(大小,分佈,有幾件事對性能很重要)。請參閱dbfiddle.uk。