Postgresql
提高 PostgreSQL 表中大掃描計算的性能
我有一個表格,
flags
在他們的. 這個想法是從總貢獻中計算有多少使用者說“是”(可以是“否”或“修復”或其他一些不確定的字元串)。該表有超過 200 萬行。添加了更多資訊:user``polygon``flag_value``usersession``flags``EXPLAIN ANALYZE
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT POSITIVE.flag_score AS p_score, TOTAL.flag_score AS t_score, 'yes' AS consensus, 'geometry' AS task, P.id AS polygon_id, 'Polygon', now(), now() FROM polygons AS P LEFT JOIN consensuspolygons AS C ON C.flaggable_id = P.id AND C.task = 'geometry' INNER JOIN ( SELECT _F.flaggable_id, COUNT(*) AS flag_count, SUM(CASE WHEN _U.role = 'admin' THEN 4 ELSE 1 END) AS flag_score FROM flags AS _F LEFT JOIN usersessions _S ON _S.session_id = _F.session_id LEFT JOIN users _U ON _U.id = _S.user_id WHERE _F.flag_value = 'yes' AND _F.flag_type = 'geometry' AND _F.flaggable_type = 'Polygon' GROUP BY _F.flaggable_id HAVING COUNT(*) >= 1 ) AS POSITIVE ON POSITIVE.flaggable_id = P.id INNER JOIN ( SELECT _F.flaggable_id, COUNT(*) AS flag_count, SUM(CASE WHEN _U.role = 'admin' THEN 4 ELSE 1 END) AS flag_score FROM flags AS _F LEFT JOIN usersessions _S ON _S.session_id = _F.session_id LEFT JOIN users _U ON _U.id = _S.user_id WHERE _F.flag_type = 'geometry' AND _F.flaggable_type = 'Polygon' GROUP BY _F.flaggable_id HAVING COUNT(*) >= 1 ) AS TOTAL ON TOTAL.flaggable_id = P.id
一個範例結果類似於:
查詢有效,但每個多邊形需要一兩秒鐘。根據下面的解釋,我看到順序掃描是最昂貴/最大的,但我不確定如何需要哪些索引。我試過了:
session_id
上和user_id
上的索引usersessions
flag_value
( ,flag_type
,flaggable_type
)上的組合索引flags
flag_value
( ,flaggable_type
)上的組合索引flags
任何想法都會受到歡迎。在查詢計劃和表 SQL 下方:
Limit (cost=146866.65..146984.55 rows=5 width=132) (actual time=1376.759..1419.053 rows=5 loops=1) Output: (sum(CASE WHEN ((_u.role)::text = 'admin'::text) THEN 4 ELSE 1 END)), (sum(CASE WHEN ((_u_1.role)::text = 'admin'::text) THEN 4 ELSE 1 END)), 'yes'::text, 'geometry'::text, p.id, 'Polygon'::text, (now()), (now()) Buffers: shared hit=31936 read=44380, temp read=3953 written=6248 -> Merge Join (cost=146866.65..179857.02 rows=1399 width=132) (actual time=1376.758..1419.051 rows=5 loops=1) Output: (sum(CASE WHEN ((_u.role)::text = 'admin'::text) THEN 4 ELSE 1 END)), (sum(CASE WHEN ((_u_1.role)::text = 'admin'::text) THEN 4 ELSE 1 END)), 'yes'::text, 'geometry'::text, p.id, 'Polygon'::text, now(), now() Inner Unique: true Merge Cond: (p.id = _f_1.flaggable_id) Buffers: shared hit=31936 read=44380, temp read=3953 written=6248 -> Merge Join (cost=61610.62..81382.32 rows=16170 width=16) (actual time=475.174..501.509 rows=5 loops=1) Output: p.id, (sum(CASE WHEN ((_u.role)::text = 'admin'::text) THEN 4 ELSE 1 END)), _f.flaggable_id Inner Unique: true Merge Cond: (p.id = _f.flaggable_id) Buffers: shared hit=15960 read=22238, temp read=1148 written=1248 -> Index Only Scan using polygons_pkey on public.polygons p (cost=0.42..4977.43 rows=191267 width=4) (actual time=0.007..0.011 rows=6 loops=1) Output: p.id Heap Fetches: 0 Buffers: shared hit=4 -> Finalize GroupAggregate (cost=61610.20..75562.91 rows=16170 width=20) (actual time=475.164..501.492 rows=5 loops=1) Output: _f.flaggable_id, NULL::bigint, sum(CASE WHEN ((_u.role)::text = 'admin'::text) THEN 4 ELSE 1 END) Group Key: _f.flaggable_id Filter: (count(*) >= 1) Buffers: shared hit=15956 read=22238, temp read=1148 written=1248 -> Gather Merge (cost=61610.20..74228.88 rows=97020 width=20) (actual time=475.144..501.478 rows=13 loops=1) Output: _f.flaggable_id, (PARTIAL sum(CASE WHEN ((_u.role)::text = 'admin'::text) THEN 4 ELSE 1 END)), (PARTIAL count(*)) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=15956 read=22238, temp read=1148 written=1248 -> Partial GroupAggregate (cost=60610.18..62030.34 rows=48510 width=20) (actual time=464.223..465.270 rows=781 loops=3) Output: _f.flaggable_id, PARTIAL sum(CASE WHEN ((_u.role)::text = 'admin'::text) THEN 4 ELSE 1 END), PARTIAL count(*) Group Key: _f.flaggable_id Buffers: shared hit=15956 read=22238, temp read=1148 written=1248 Worker 0: actual time=463.560..465.163 rows=1170 loops=1 Buffers: shared hit=4716 read=7226, temp read=379 written=413 Worker 1: actual time=469.522..471.025 rows=1170 loops=1 Buffers: shared hit=4957 read=7185, temp read=379 written=414 -> Sort (cost=60610.18..60797.19 rows=74805 width=13) (actual time=464.204..464.587 rows=2101 loops=3) Output: _f.flaggable_id, _u.role Sort Key: _f.flaggable_id Sort Method: external merge Disk: 3344kB Buffers: shared hit=15956 read=22238, temp read=1148 written=1248 Worker 0: actual time=463.542..464.124 rows=3115 loops=1 Sort Method: external merge Disk: 3288kB Buffers: shared hit=4716 read=7226, temp read=379 written=413 Worker 1: actual time=469.497..470.054 rows=3174 loops=1 Sort Method: external merge Disk: 3296kB Buffers: shared hit=4957 read=7185, temp read=379 written=414 -> Hash Left Join (cost=241.31..54554.40 rows=74805 width=13) (actual time=4.276..359.295 rows=170970 loops=3) Output: _f.flaggable_id, _u.role Hash Cond: ((_f.session_id)::text = (_s.session_id)::text) Buffers: shared hit=15882 read=22238 Worker 0: actual time=4.279..349.064 rows=169302 loops=1 Buffers: shared hit=4679 read=7226 Worker 1: actual time=4.314..354.762 rows=169343 loops=1 Buffers: shared hit=4920 read=7185 -> Parallel Seq Scan on public.flags _f (cost=0.00..53444.43 rows=74805 width=37) (actual time=0.158..292.951 rows=170379 loops=3) Output: _f.id, _f.flag_type, _f.flaggable_id, _f.session_id, _f.flag_value, _f.is_primary, _f.created_at, _f.updated_at, _f.latitude, _f.longitude, _f.flaggable_type Filter: ((_f.flag_value = 'yes'::text) AND ((_f.flag_type)::text = 'geometry'::text) AND ((_f.flaggable_type)::text = 'Polygon'::text)) Rows Removed by Filter: 545343 Buffers: shared hit=15550 read=22238 Worker 0: actual time=0.030..277.750 rows=168670 loops=1 Buffers: shared hit=4560 read=7226 Worker 1: actual time=0.026..282.599 rows=168781 loops=1 Buffers: shared hit=4801 read=7185 -> Hash (cost=187.94..187.94 rows=4270 width=42) (actual time=4.064..4.069 rows=4270 loops=3) Output: _s.session_id, _u.role Buckets: 8192 Batches: 1 Memory Usage: 377kB Buffers: shared hit=304 Worker 0: actual time=4.161..4.167 rows=4270 loops=1 Buffers: shared hit=105 Worker 1: actual time=4.218..4.225 rows=4270 loops=1 Buffers: shared hit=105 -> Hash Left Join (cost=85.00..187.94 rows=4270 width=42) (actual time=0.871..2.891 rows=4270 loops=3) Output: _s.session_id, _u.role Inner Unique: true Hash Cond: (_s.user_id = _u.id) Buffers: shared hit=304 Worker 0: actual time=1.027..2.941 rows=4270 loops=1 Buffers: shared hit=105 Worker 1: actual time=0.922..3.006 rows=4270 loops=1 Buffers: shared hit=105 -> Seq Scan on public.usersessions _s (cost=0.00..91.70 rows=4270 width=37) (actual time=0.031..0.590 rows=4270 loops=3) Output: _s.id, _s.user_id, _s.session_id, _s.created_at, _s.updated_at Buffers: shared hit=147 Worker 0: actual time=0.050..0.604 rows=4270 loops=1 Buffers: shared hit=49 Worker 1: actual time=0.041..0.639 rows=4270 loops=1 Buffers: shared hit=49 -> Hash (cost=62.78..62.78 rows=1778 width=13) (actual time=0.816..0.816 rows=1778 loops=3) Output: _u.role, _u.id Buckets: 2048 Batches: 1 Memory Usage: 100kB Buffers: shared hit=135 Worker 0: actual time=0.938..0.939 rows=1778 loops=1 Buffers: shared hit=45 Worker 1: actual time=0.853..0.853 rows=1778 loops=1 Buffers: shared hit=45 -> Seq Scan on public.users _u (cost=0.00..62.78 rows=1778 width=13) (actual time=0.026..0.529 rows=1778 loops=3) Output: _u.role, _u.id Buffers: shared hit=135 Worker 0: actual time=0.040..0.615 rows=1778 loops=1 Buffers: shared hit=45 Worker 1: actual time=0.033..0.574 rows=1778 loops=1 Buffers: shared hit=45 -> Finalize GroupAggregate (cost=85256.02..98206.42 rows=16549 width=20) (actual time=901.579..917.534 rows=6 loops=1) Output: _f_1.flaggable_id, NULL::bigint, sum(CASE WHEN ((_u_1.role)::text = 'admin'::text) THEN 4 ELSE 1 END) Group Key: _f_1.flaggable_id Filter: (count(*) >= 1) Buffers: shared hit=15976 read=22142, temp read=2805 written=5000 -> Gather Merge (cost=85256.02..96841.13 rows=99294 width=20) (actual time=901.566..917.519 rows=19 loops=1) Output: _f_1.flaggable_id, (PARTIAL sum(CASE WHEN ((_u_1.role)::text = 'admin'::text) THEN 4 ELSE 1 END)), (PARTIAL count(*)) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=15976 read=22142, temp read=2805 written=5000 -> Sort (cost=84256.00..84380.12 rows=49647 width=20) (actual time=884.539..884.650 rows=782 loops=3) Output: _f_1.flaggable_id, (PARTIAL sum(CASE WHEN ((_u_1.role)::text = 'admin'::text) THEN 4 ELSE 1 END)), (PARTIAL count(*)) Sort Key: _f_1.flaggable_id Sort Method: external merge Disk: 3488kB Buffers: shared hit=15976 read=22142, temp read=2805 written=5000 Worker 0: actual time=877.889..878.075 rows=1170 loops=1 Sort Method: external merge Disk: 3424kB Buffers: shared hit=4826 read=7010, temp read=930 written=1661 Worker 1: actual time=877.126..877.268 rows=1170 loops=1 Sort Method: external merge Disk: 3376kB Buffers: shared hit=4873 read=6965, temp read=917 written=1647 -> Partial HashAggregate (cost=76816.50..80383.68 rows=49647 width=20) (actual time=686.588..775.954 rows=102917 loops=3) Output: _f_1.flaggable_id, PARTIAL sum(CASE WHEN ((_u_1.role)::text = 'admin'::text) THEN 4 ELSE 1 END), PARTIAL count(*) Group Key: _f_1.flaggable_id Batches: 5 Memory Usage: 4145kB Disk Usage: 6992kB Buffers: shared hit=15962 read=22142, temp read=1519 written=3708 Worker 0: actual time=678.709..778.991 rows=102760 loops=1 Batches: 5 Memory Usage: 4145kB Disk Usage: 6944kB Buffers: shared hit=4819 read=7010, temp read=502 written=1231 Worker 1: actual time=679.094..762.217 rows=101274 loops=1 Batches: 5 Memory Usage: 4145kB Disk Usage: 6928kB Buffers: shared hit=4866 read=6965, temp read=495 written=1223 -> Hash Left Join (cost=241.31..55100.49 rows=314440 width=13) (actual time=4.553..446.632 rows=297380 loops=3) Output: _f_1.flaggable_id, _u_1.role Hash Cond: ((_f_1.session_id)::text = (_s_1.session_id)::text) Buffers: shared hit=15962 read=22142 Worker 0: actual time=5.245..455.337 rows=288616 loops=1 Buffers: shared hit=4819 read=7010 Worker 1: actual time=4.870..422.737 rows=290016 loops=1 Buffers: shared hit=4866 read=6965 -> Parallel Seq Scan on public.flags _f_1 (cost=0.00..51207.80 rows=314440 width=37) (actual time=0.084..309.027 rows=296552 loops=3) Output: _f_1.id, _f_1.flag_type, _f_1.flaggable_id, _f_1.session_id, _f_1.flag_value, _f_1.is_primary, _f_1.created_at, _f_1.updated_at, _f_1.latitude, _f_1.longitude, _f_1.flaggable_type Filter: (((_f_1.flag_type)::text = 'geometry'::text) AND ((_f_1.flaggable_type)::text = 'Polygon'::text)) Rows Removed by Filter: 419171 Buffers: shared hit=15646 read=22142 Worker 0: actual time=0.140..325.695 rows=287887 loops=1 Buffers: shared hit=4708 read=7010 Worker 1: actual time=0.023..276.184 rows=289219 loops=1 Buffers: shared hit=4755 read=6965 -> Hash (cost=187.94..187.94 rows=4270 width=42) (actual time=4.383..4.388 rows=4270 loops=3) Output: _s_1.session_id, _u_1.role Buckets: 8192 Batches: 1 Memory Usage: 377kB Buffers: shared hit=288 Worker 0: actual time=4.952..4.958 rows=4270 loops=1 Buffers: shared hit=97 Worker 1: actual time=4.754..4.760 rows=4270 loops=1 Buffers: shared hit=97 -> Hash Left Join (cost=85.00..187.94 rows=4270 width=42) (actual time=1.082..3.155 rows=4270 loops=3) Output: _s_1.session_id, _u_1.role Inner Unique: true Hash Cond: (_s_1.user_id = _u_1.id) Buffers: shared hit=288 Worker 0: actual time=1.307..3.560 rows=4270 loops=1 Buffers: shared hit=97 Worker 1: actual time=1.264..3.393 rows=4270 loops=1 Buffers: shared hit=97 -> Seq Scan on public.usersessions _s_1 (cost=0.00..91.70 rows=4270 width=37) (actual time=0.043..0.609 rows=4270 loops=3) Output: _s_1.id, _s_1.user_id, _s_1.session_id, _s_1.created_at, _s_1.updated_at Buffers: shared hit=147 Worker 0: actual time=0.068..0.708 rows=4270 loops=1 Buffers: shared hit=49 Worker 1: actual time=0.058..0.640 rows=4270 loops=1 Buffers: shared hit=49 -> Hash (cost=62.78..62.78 rows=1778 width=13) (actual time=1.020..1.020 rows=1778 loops=3) Output: _u_1.role, _u_1.id Buckets: 2048 Batches: 1 Memory Usage: 100kB Buffers: shared hit=135 Worker 0: actual time=1.211..1.211 rows=1778 loops=1 Buffers: shared hit=45 Worker 1: actual time=1.182..1.182 rows=1778 loops=1 Buffers: shared hit=45 -> Seq Scan on public.users _u_1 (cost=0.00..62.78 rows=1778 width=13) (actual time=0.028..0.644 rows=1778 loops=3) Output: _u_1.role, _u_1.id Buffers: shared hit=135 Worker 0: actual time=0.050..0.798 rows=1778 loops=1 Buffers: shared hit=45 Worker 1: actual time=0.030..0.778 rows=1778 loops=1 Buffers: shared hit=45 Planning: Buffers: shared hit=32 Planning Time: 0.594 ms Execution Time: 1447.677 ms
表:
CREATE TABLE public.consensuspolygons ( id integer NOT NULL DEFAULT nextval('consensuspolygons_id_seq'::regclass), task character varying(255) COLLATE pg_catalog."default", flaggable_id integer, consensus text COLLATE pg_catalog."default", flaggable_type character varying(255) COLLATE pg_catalog."default" DEFAULT 'Polygon'::character varying, CONSTRAINT consensuspolygons_pkey PRIMARY KEY (id) ) CREATE INDEX index_task ON public.consensuspolygons USING btree (task COLLATE pg_catalog."default" ASC NULLS LAST) TABLESPACE pg_default; CREATE UNIQUE INDEX index_task_consensus_on_polygon_id ON public.consensuspolygons USING btree (flaggable_id ASC NULLS LAST, task COLLATE pg_catalog."default" ASC NULLS LAST) TABLESPACE pg_default; CREATE TABLE public.flags ( id integer NOT NULL DEFAULT nextval('flags_id_seq'::regclass), flag_type character varying(255) COLLATE pg_catalog."default", flaggable_id integer, session_id character varying(255) COLLATE pg_catalog."default", flag_value text COLLATE pg_catalog."default", flaggable_type character varying(255) COLLATE pg_catalog."default" DEFAULT 'Polygon'::character varying, CONSTRAINT flags_pkey PRIMARY KEY (id) ) CREATE UNIQUE INDEX index_flags_on_session_id ON public.flags USING btree (session_id COLLATE pg_catalog."default" ASC NULLS LAST, flag_type COLLATE pg_catalog."default" ASC NULLS LAST, flaggable_id ASC NULLS LAST, flag_value COLLATE pg_catalog."default" ASC NULLS LAST) TABLESPACE pg_default; CREATE INDEX polygon_index ON public.flags USING btree (flaggable_id ASC NULLS LAST) TABLESPACE pg_default; CREATE TABLE public.usersessions ( id integer NOT NULL DEFAULT nextval('usersessions_id_seq'::regclass), user_id integer, session_id character varying(255) COLLATE pg_catalog."default", created_at timestamp without time zone NOT NULL, updated_at timestamp without time zone NOT NULL, CONSTRAINT usersessions_pkey PRIMARY KEY (id) )
編輯:
@jjanes ( DB Fiddle ) 建議的新查詢:
SELECT flag_score.flag_score_pos, flag_score.flag_score_tot, 'yes' AS consensus, 'geometry' AS task, P.id AS polygon_id, 'Polygon', now(), now() FROM polygons AS P LEFT JOIN consensuspolygons AS C ON C.flaggable_id = P.id AND C.task = 'geometry' INNER JOIN ( SELECT _F.flaggable_id, SUM(CASE WHEN _U.role = 'admin' THEN 4 ELSE 1 END) AS flag_score_tot, SUM(CASE WHEN _U.role = 'admin' THEN 4 ELSE 1 END) FILTER (WHERE flag_value = 'yes') AS flag_score_pos FROM flags AS _F LEFT JOIN usersessions _S ON _S.session_id = _F.session_id LEFT JOIN users _U ON _U.id = _S.user_id WHERE _F.flag_type = 'geometry' AND _F.flaggable_type = 'Polygon' GROUP BY _F.flaggable_id HAVING COUNT(*) FILTER (WHERE flag_value = 'yes') >= 1 ) AS flag_score ON flag_score.flaggable_id = P.id
它可以工作,但“只”快了大約三分之一(對於近 300k 行,約 1.08 秒而不是約 1.44 秒),因為它正在保存兩個順序掃描中的一個。是否可以添加索引帽來優化順序掃描?
查詢計劃:
Merge Join (cost=97612.95..116721.86 rows=16549 width=132) (actual time=740.885..1039.578 rows=106725 loops=1) Merge Cond: (p.id = _f.flaggable_id) -> Index Only Scan using polygons_pkey on polygons p (cost=0.42..4977.43 rows=191267 width=4) (actual time=0.011..24.771 rows=191265 loops=1) Heap Fetches: 0 -> Finalize GroupAggregate (cost=97612.53..110811.17 rows=16549 width=20) (actual time=740.869..957.634 rows=107660 loops=1) Group Key: _f.flaggable_id Filter: (count(*) FILTER (WHERE (_f.flag_value = 'yes'::text)) >= 1) Rows Removed by Filter: 30324 -> Gather Merge (cost=97612.53..109197.64 rows=99294 width=28) (actual time=740.856..855.408 rows=309388 loops=1) Workers Planned: 2 Workers Launched: 2 -> Sort (cost=96612.51..96736.63 rows=49647 width=28) (actual time=735.751..761.491 rows=103129 loops=3) Sort Key: _f.flaggable_id Sort Method: external merge Disk: 4136kB Worker 0: Sort Method: external merge Disk: 4024kB Worker 1: Sort Method: external merge Disk: 4008kB -> Partial HashAggregate (cost=87330.59..92740.19 rows=49647 width=28) (actual time=546.162..643.903 rows=103129 loops=3) Group Key: _f.flaggable_id Planned Partitions: 4 Batches: 5 Memory Usage: 4145kB Disk Usage: 7128kB Worker 0: Batches: 5 Memory Usage: 4145kB Disk Usage: 7064kB Worker 1: Batches: 5 Memory Usage: 4145kB Disk Usage: 7056kB -> Hash Left Join (cost=241.31..55100.49 rows=314440 width=34) (actual time=3.961..325.059 rows=297380 loops=3) Hash Cond: ((_f.session_id)::text = (_s.session_id)::text) -> Parallel Seq Scan on flags _f (cost=0.00..51207.80 rows=314440 width=58) (actual time=0.098..233.054 rows=296552 loops=3) Filter: (((flag_type)::text = 'geometry'::text) AND ((flaggable_type)::text = 'Polygon'::text)) Rows Removed by Filter: 419171 -> Hash (cost=187.94..187.94 rows=4270 width=42) (actual time=3.812..3.817 rows=4270 loops=3) Buckets: 8192 Batches: 1 Memory Usage: 377kB -> Hash Left Join (cost=85.00..187.94 rows=4270 width=42) (actual time=0.848..2.711 rows=4270 loops=3) Hash Cond: (_s.user_id = _u.id) -> Seq Scan on usersessions _s (cost=0.00..91.70 rows=4270 width=37) (actual time=0.024..0.556 rows=4270 loops=3) -> Hash (cost=62.78..62.78 rows=1778 width=13) (actual time=0.807..0.808 rows=1778 loops=3) Buckets: 2048 Batches: 1 Memory Usage: 100kB -> Seq Scan on users _u (cost=0.00..62.78 rows=1778 width=13) (actual time=0.020..0.539 rows=1778 loops=3) Planning Time: 0.488 ms Execution Time: 1086.747 ms
您可以使用 FILTER 將額外條件添加到一個聚合而不是其他聚合,因此您的兩個子查詢可以簡化為:
INNER JOIN ( SELECT _F.flaggable_id, SUM(CASE WHEN _U.role = 'admin' THEN 4 ELSE 1 END) AS flag_score_tot, SUM(CASE WHEN _U.role = 'admin' THEN 4 ELSE 1 END) FILTER (WHERE flag_value = 'yes') AS flag_score_pos FROM flags AS _F LEFT JOIN usersessions _S ON _S.session_id = _F.session_id LEFT JOIN users _U ON _U.id = _S.user_id WHERE _F.flag_type = 'geometry' AND _F.flaggable_type = 'Polygon' GROUP BY _F.flaggable_id HAVING COUNT(*) FILTER (WHERE flag_value = 'yes') >= 1 ) AS flag_score
可能有幫助的另一件事是增加 work_mem 足以使 hashagg 和 sort 不會溢出到磁碟。