Postgresql
在可為空的布爾列上未拾取索引
我有 2 個 TimescaleDB 數據庫(PROD 和 DEV),它們都有一個具有相同架構和幾乎相同數據的超表(第一個為 44M 行,第二個為 40M)。
我必須在布爾列上查詢一些數據
SELECT * FROM pumpcards WHERE to_label = TRUE ORDER BY starttime DESC;
因為我只有幾張非空值的卡片 (1xxx),所以我創建了一個索引:
CREATE INDEX to_label_true ON pumpcards (to_label) WHERE to_label IS NOT NULL;
現在我在 dev db 上的 SELECT 需要 500 毫秒,但在 prod 上需要 25 秒以上,CPU 達到 100%
在我
explain analyse
對 DEV 的查詢中,我可以看到索引正在被拾取:
Index Cond: (to_label = true)
但是 PROD 沒有使用索引,它正在過濾
to_label
:
Filter: to_label Rows Removed by Filter: 2164238
我刪除了所有索引,重新創建了它們,我也執行了
VACUUM ANALYZE pumpcards
. 沒運氣。我的表架構:
CREATE TABLE IF NOT EXISTS public.pumpcards ( id bigint NOT NULL DEFAULT nextval('pumpcards_id_seq'::regclass), wellid integer, starttime timestamp without time zone NOT NULL, endtime timestamp without time zone NOT NULL, physical_card boolean NOT NULL, nb_points integer NOT NULL, "position" text COLLATE pg_catalog."default" NOT NULL, load text COLLATE pg_catalog."default" NOT NULL, pressure text COLLATE pg_catalog."default", opti_gf double precision, ml_gf double precision, error_score double precision, novelty_score double precision, labeled boolean NOT NULL, manual_gf double precision, to_label boolean, erroneous_card boolean, fluid_pound boolean, gas_interference boolean, pump_tagging boolean, worn_top_valve boolean, worn_bottom_valve boolean, stuck_top_valve boolean, stuck_bottom_valve boolean, worn_barrel boolean, unanchored_tubing boolean, stuck_pump boolean, parted_rod boolean, solid_friction_along_rod boolean, solid_friction_in_pump boolean, tubing_leak boolean, tight_stuffing_box boolean, flumping boolean, depth_of_issue double precision, to_investigate boolean, position_downhole text COLLATE pg_catalog."default", load_downhole text COLLATE pg_catalog."default", work double precision NOT NULL, work_downhole double precision, gas_lock boolean, amplitude_dwl real, amplitude_sfc real, max_position real, min_position real, max_load real, min_load real, CONSTRAINT pumpcards_pkey PRIMARY KEY (id, starttime), CONSTRAINT pumpcards_wellid_starttime_key UNIQUE (wellid, starttime) ) CREATE INDEX IF NOT EXISTS pumpcards_id_idx ON public.pumpcards USING btree (id DESC NULLS FIRST); CREATE INDEX IF NOT EXISTS pumpcards_labeled_erroneous_card_idx ON public.pumpcards USING btree (labeled DESC NULLS FIRST, erroneous_card ASC NULLS LAST) WHERE manual_gf IS NOT NULL; CREATE INDEX IF NOT EXISTS pumpcards_starttime_idx ON public.pumpcards USING btree (starttime DESC NULLS FIRST); CREATE INDEX to_label_true ON pumpcards (to_label) WHERE to_label IS NOT NULL; CREATE INDEX IF NOT EXISTS pumpcards_wellid_starttime_idx ON public.pumpcards USING btree (wellid ASC NULLS LAST, starttime DESC NULLS FIRST); CREATE TRIGGER ts_insert_blocker BEFORE INSERT ON public.pumpcards FOR EACH ROW EXECUTE FUNCTION _timescaledb_internal.insert_blocker();
EXPLAIN ANALYZE
PROD 中的範例:Parallel Seq Scan on _hyper_53_2637_chunk (cost=0.00..98436.09 rows=1 width=400) (actual time=10555.252..10555.253 rows=0 loops=1) Filter: to_label Rows Removed by Filter: 2138437
有任何想法嗎?
從你報告的情況來看,應該使用索引。一定是有什麼誤解。
目前還不清楚我們到底在談論哪個指數。您在頂部提到了這一點:
CREATE INDEX to_label_true ON pumpcards (to_label) WHERE to_label IS NOT NULL;
並在表描述中顯示這兩個不同的索引:
CREATE INDEX IF NOT EXISTS pumpcards_to_label_starttime_idx ON public.pumpcards USING btree (to_label ASC NULLS LAST, starttime DESC NULLS FIRST) WHERE to_label IS NOT NULL; CREATE INDEX IF NOT EXISTS to_label_true ON public.pumpcards USING btree (to_label ASC NULLS LAST) WHERE to_label IS TRUE;
都是不同的,而且都不是最佳的。這將非常適合您的查詢:
CREATE INDEX to_label_true_starttime_desc ON public.pumpcards (starttime DESC) WHERE to_label = true;