Postgresql

指定 JOIN 和 LIMIT 時 SELECT 非常慢,但找不到/很少找到結果

  • February 18, 2019

我遇到了一個問題,即查詢計劃器沒有考慮 JOIN 如何影響特定值的頻率。

對於某些情況:我將會話儲存在僅附加日誌中,其中每個會話將有多個條目(包含會話的所有數據)每個更新一個。

create table session(
 update_id                                     bigint                   not null
   constraint "PK_gameSession_update"
   primary key,
 event_id                                      bigint                   not null,
 event_type                                    varchar(255)             not null,
 time_in_millis                                timestamp with time zone not null,
 game_code                                     varchar(100)             not null,
 account_id                                    bigint,
 session_id                                    bigint                   not null,
 game_id                                       bigint                   not null,
 session_client_ref                            varchar(255)             not null,
 session_external_ref                          varchar(255)             not null,
 player_external_ref                           varchar(255),
 community_id                                  bigint                   not null,
 community_code                                varchar(255)             not null,
 device                                        text,
 type                                          varchar(50)              not null,
 country_code                                  varchar(10),
 game_after_state                              jsonb,
 accept_language                               char(5)                  not null,
 currency                                      varchar(255),
 status                                        varchar(50)              not null,
 provider_metadata                             jsonb,
 operator_code                                 varchar(255),
 jurisdiction                                  varchar(255),
 pending_round_id                              bigint,
 provider_managed_pending_round                boolean,
 time_started                                  timestamp with time zone not null,
 last_time_updated                             timestamp with time zone not null,
 action                                        varchar(30)              not null,
 seconds_to_idle                               bigint,
 provider_maximum_concurrent_sessions_per_game integer,
 referrer                                      text
);

create index "IDX_session_accept_language" on session (accept_language);
create index "IDX_session_account_id" on session (account_id);
create index "IDX_session_action" on session (action);
create index "IDX_session_community_code" on session (community_code);
create index "IDX_session_community_id" on session (community_id);
create index "IDX_session_country_code" on session (country_code);
create index "IDX_session_currency" on session (currency);
create index "IDX_session_game_after_state" on session (game_after_state);
create index "IDX_session_game_code" on session (game_code);
create index "IDX_session_game_id" on session (game_id);
create index "IDX_session_jurisdiction" on session (jurisdiction);
create index "IDX_session_last_time_updated" on session (last_time_updated);
create index "IDX_session_max_concurrent_sessions_per_game" on session (provider_maximum_concurrent_sessions_per_game);
create index "IDX_session_operator_code" on session (operator_code);
create index "IDX_session_pending_round_id" on session (pending_round_id);
create index "IDX_session_player_external_ref" on session (player_external_ref);
create index "IDX_session_provider_managed_pending_round" on session (provider_managed_pending_round);
create index "IDX_session_provider_metadata" on session (provider_metadata);
create index "IDX_session_referrer" on session (referrer);
create index "IDX_session_seconds_to_idle" on session (seconds_to_idle);
create index "IDX_session_session_client_ref" on session (session_client_ref);
create index "IDX_session_session_external_ref" on session (session_external_ref);
create index "IDX_session_status" on session (status);
create index "IDX_session_time_started" on session (time_started);
create index "IDX_session_type" on session (type);
create index "IDX_session_update_community_id" on session (update_id, community_id);
create index "IDX_session_update_partial_status_started" on session (update_id) where ((status) :: text = 'STARTED' :: text);
create index "IDX_session_session_id_player_external_ref"  on session (session_id, player_external_ref);
create index "IDX_session_session_id" on session (session_id);

為了跟踪“最新”資訊,我有第二張表來跟踪這一點。

create table sessionlatest
(
 session_id bigint                   not null
   constraint "PK_session_id_latest"
   primary key,
 update_id       bigint                   not null,
 event_id        bigint                   not null,
 time_in_millis  timestamp with time zone not null
);

create index "IDX_sessionlatest_update_id" on sessionlatest (update_id);

範例 給定一個已創建的會話,然後更新兩次並最終完成,將生成以下數據,其中 update_id 顯示事件更改發生的順序:

會話表

+-----------+------------+-----------+
| update_id | session_id |  status   |
+-----------+------------+-----------+
|         1 |          1 | STARTED   |
|         2 |          1 | STARTED   |
|         3 |          1 | STARTED   |
|         4 |          1 | COMPLETED |
+-----------+------------+-----------+

見最新表:

+------------+-----------+--+
| session_id | update_id |  |
+------------+-----------+--+
|          1 |         4 |  |
+------------+-----------+--+

這樣我就知道如果我查找會話 1 的 update_id 4,那將是會話的最新(正確)視圖。但我仍然可以查找它的變化歷史。

基數

會話中約 500 萬行

會話中約 97% 的行狀態為“已啟動”

sessionlatest 中約 50K 行

~10-50 行加入會話和 sessionlatest(即大多數會話不再啟動)

我的目標是獲取目前“已啟動”的會話列表,即它們的最新更新狀態為“已啟動”

Postgres 版本 9.6.10

查詢

問題查詢:

EXPLAIN ANALYSE SELECT *
FROM session gs
join "public"."sessionlatest" gsl
 on gs."update_id" = gsl."update_id"
WHERE gs."status" = 'STARTED'
order by gs."session_id" desc
LIMIT 500;

Limit  (cost=0.85..37681.96 rows=500 width=909) (actual time=0.137..45960.666 rows=29 loops=1)
 ->  Nested Loop  (cost=0.85..2996779.86 rows=39765 width=909) (actual time=0.135..45960.644 rows=29 loops=1)
       ->  Index Scan Backward using "IDX_session_session_id" on session gs  (cost=0.43..808987.56 rows=4884873 width=869) (actual time=0.038..32928.897 rows=4848024 loops=1)
             Filter: ((status)::text = 'STARTED'::text)
             Rows Removed by Filter: 115172
       ->  Index Scan using "IDX_sessionlatest_update_id" on sessionlatest gsl  (cost=0.41..0.44 rows=1 width=32) (actual time=0.002..0.002 rows=0 loops=4848024)
             Index Cond: (update_id = gs.update_id)
Planning time: 1.108 ms
Execution time: 45960.839 ms

無限制查詢:

EXPLAIN ANALYSE SELECT *
FROM session gs
join "public"."sessionlatest" gsl
 on gs."update_id" = gsl."update_id"
WHERE gs."status" = 'STARTED'
order by gs."session_id" desc;

Sort  (cost=249561.70..249661.11 rows=39765 width=909) (actual time=145.306..145.317 rows=28 loops=1)
 Sort Key: gs.session_id DESC
 Sort Method: quicksort  Memory: 55kB
 ->  Nested Loop  (cost=0.43..230619.81 rows=39765 width=909) (actual time=69.713..145.205 rows=28 loops=1)
       ->  Seq Scan on sessionlatest gsl  (cost=0.00..743.23 rows=40723 width=32) (actual time=0.010..5.386 rows=40732 loops=1)
       ->  Index Scan using "IDX_session_update_partial_status_started" on session gs  (cost=0.43..5.63 rows=1 width=869) (actual time=0.003..0.003 rows=0 loops=40732)
             Index Cond: (update_id = gsl.update_id)
Planning time: 0.955 ms
Execution time: 145.431 ms

無序查詢:

EXPLAIN ANALYSE SELECT *
FROM session gs
join "public"."sessionlatest" gsl
 on gs."update_id" = gsl."update_id"
WHERE gs."status" = 'STARTED'
LIMIT 500;

Limit  (cost=0.43..2900.21 rows=500 width=901) (actual time=70.971..149.729 rows=28 loops=1)
 ->  Nested Loop  (cost=0.43..230619.81 rows=39765 width=901) (actual time=70.970..149.721 rows=28 loops=1)
       ->  Seq Scan on sessionlatest gsl  (cost=0.00..743.23 rows=40723 width=32) (actual time=0.011..5.655 rows=40732 loops=1)
       ->  Index Scan using "IDX_session_update_partial_status_started" on session gs  (cost=0.43..5.63 rows=1 width=869) (actual time=0.003..0.003 rows=0 loops=40732)
             Index Cond: (update_id = gsl.update_id)
Planning time: 0.874 ms
Execution time: 149.820 ms

在我的案例中,我需要排序和限制。經過一些研究,我認為由於狀態“已啟動”非常常見,因此查詢計劃程序更喜歡在有限制的情況下進行順序掃描(因為它假設它將滿足前幾行的限制)。這是不正確的,因為在加入 sessionlatest 表後,永遠不會滿足限制。

我嘗試添加多列索引,但它被忽略了。

CREATE INDEX idx_session_id_status ON public.session(session_id, status);

我也嘗試了部分索引,但性能改進很小。

CREATE INDEX idx_session_id_partial_status ON public.session(session_id) WHERE public.session.status='STARTED';

Limit  (cost=0.72..34319.23 rows=500 width=903)
 ->  Nested Loop  (cost=0.72..1145415.36 rows=16688 width=903)
       ->  Index Scan Backward using idx_session_id_partial_status on session gs  (cost=0.43..435253.15 rows=2252261 width=863)
       ->  Index Scan using "IDX_Sessionlatest_update_id" on sessionlatest gsl  (cost=0.29..0.31 rows=1 width=32)
             Index Cond: (update_id = gs.update_id)

最後我看到一個類似的問題SELECT very slow when no results and a LIMIT is specified主要區別是我的極端情況是由連接而不是多個過濾器創建的(我顯然不能在不同的表中創建索引)

您可以嘗試幾件事:

  • 添加statussessionlatest表。由於此表就像最新會話的“記憶體”,因此這樣做絕對有意義(@jjanus 建議)。所有這些WHERE - ORDER BY - LIMIT都涉及來自同一個(和更小的)表的列,因此可以有效地使用合適的索引(然後連接到大表只需找到相關的 500 行)。
  • 我不明白你為什麼要加入update_id而不是session_id。這對執行計劃沒有很好的影響。您想要ORDER BY並且LIMIT通過,session_id但連接在另一列上。也許您這樣做是為了在每個會話中獲得一個條目。可以重寫查詢以session_id替代使用。
  • 我會嘗試LATERAL JOIN從較小的桌子開始使用。

看到這個(使用update_idsession_id兩者):

select gs.*, gsl.*
from sessionlatest gsl
 join lateral
 ( select s.*
   from session s
   where s.session_id = gsl.session_id
     and s.update_id = gsl.update_id
   order by s.update_id desc
   LIMIT 1
 ) as gs
 on gs.status = 'STARTED'
order by gsl.session_id desc
limit 500 ;

我認為如果你有一個索引,上面會產生一個更好的執行計劃,session (session_id, update_id)但是如果你用你的表進行測試會更好(大小,分佈,有幾件事對性能很重要)。

請參閱dbfiddle.uk

引用自:https://dba.stackexchange.com/questions/230043