Postgresql

在可為空的布爾列上未拾取索引

  • February 14, 2022

我有 2 個 TimescaleDB 數據庫(PROD 和 DEV),它們都有一個具有相同架構和幾乎相同數據的超表(第一個為 44M 行,第二個為 40M)。

我必須在布爾列上查詢一些數據

SELECT * FROM pumpcards
WHERE to_label = TRUE
ORDER BY starttime DESC;

因為我只有幾張非空值的卡片 (1xxx),所以我創建了一個索引:

CREATE INDEX to_label_true ON pumpcards (to_label)
WHERE to_label IS NOT NULL;

現在我在 dev db 上的 SELECT 需要 500 毫秒,但在 prod 上需要 25 秒以上,CPU 達到 100%

在我explain analyse對 DEV 的查詢中,我可以看到索引正在被拾取:

Index Cond: (to_label = true)

但是 PROD 沒有使用索引,它正在過濾to_label

Filter: to_label Rows Removed by Filter: 2164238

我刪除了所有索引,重新創建了它們,我也執行了VACUUM ANALYZE pumpcards. 沒運氣。

我的表架構:

CREATE TABLE IF NOT EXISTS public.pumpcards (
   id bigint NOT NULL DEFAULT nextval('pumpcards_id_seq'::regclass),
   wellid integer,
   starttime timestamp without time zone NOT NULL,
   endtime timestamp without time zone NOT NULL,
   physical_card boolean NOT NULL,
   nb_points integer NOT NULL,
   "position" text COLLATE pg_catalog."default" NOT NULL,
   load text COLLATE pg_catalog."default" NOT NULL,
   pressure text COLLATE pg_catalog."default",
   opti_gf double precision,
   ml_gf double precision,
   error_score double precision,
   novelty_score double precision,
   labeled boolean NOT NULL,
   manual_gf double precision,
   to_label boolean,
   erroneous_card boolean,
   fluid_pound boolean,
   gas_interference boolean,
   pump_tagging boolean,
   worn_top_valve boolean,
   worn_bottom_valve boolean,
   stuck_top_valve boolean,
   stuck_bottom_valve boolean,
   worn_barrel boolean,
   unanchored_tubing boolean,
   stuck_pump boolean,
   parted_rod boolean,
   solid_friction_along_rod boolean,
   solid_friction_in_pump boolean,
   tubing_leak boolean,
   tight_stuffing_box boolean,
   flumping boolean,
   depth_of_issue double precision,
   to_investigate boolean,
   position_downhole text COLLATE pg_catalog."default",
   load_downhole text COLLATE pg_catalog."default",
   work double precision NOT NULL,
   work_downhole double precision,
   gas_lock boolean,
   amplitude_dwl real,
   amplitude_sfc real,
   max_position real,
   min_position real,
   max_load real,
   min_load real,
   CONSTRAINT pumpcards_pkey PRIMARY KEY (id, starttime),
   CONSTRAINT pumpcards_wellid_starttime_key UNIQUE (wellid, starttime)
)

CREATE INDEX IF NOT EXISTS pumpcards_id_idx
   ON public.pumpcards USING btree
   (id DESC NULLS FIRST);

CREATE INDEX IF NOT EXISTS pumpcards_labeled_erroneous_card_idx
   ON public.pumpcards USING btree
   (labeled DESC NULLS FIRST, erroneous_card ASC NULLS LAST)
   WHERE manual_gf IS NOT NULL;

CREATE INDEX IF NOT EXISTS pumpcards_starttime_idx
   ON public.pumpcards USING btree
   (starttime DESC NULLS FIRST);

CREATE INDEX to_label_true ON pumpcards (to_label)
WHERE to_label IS NOT NULL;

CREATE INDEX IF NOT EXISTS pumpcards_wellid_starttime_idx
   ON public.pumpcards USING btree
   (wellid ASC NULLS LAST, starttime DESC NULLS FIRST);


CREATE TRIGGER ts_insert_blocker
   BEFORE INSERT
   ON public.pumpcards
   FOR EACH ROW
   EXECUTE FUNCTION _timescaledb_internal.insert_blocker();

EXPLAIN ANALYZEPROD 中的範例:

Parallel Seq Scan on _hyper_53_2637_chunk  (cost=0.00..98436.09 rows=1 width=400) (actual time=10555.252..10555.253 rows=0 loops=1)
                   Filter: to_label
                   Rows Removed by Filter: 2138437

有任何想法嗎?

從你報告的情況來看,應該使用索引。一定是有什麼誤解。

目前還不清楚我們到底在談論哪個指數。您在頂部提到了這一點:

CREATE INDEX to_label_true ON pumpcards (to_label)
WHERE to_label IS NOT NULL;

並在表描述中顯示這兩個不同的索引:

CREATE INDEX IF NOT EXISTS pumpcards_to_label_starttime_idx
   ON public.pumpcards USING btree
   (to_label ASC NULLS LAST, starttime DESC NULLS FIRST)
   WHERE to_label IS NOT NULL;

CREATE INDEX IF NOT EXISTS to_label_true
   ON public.pumpcards USING btree
   (to_label ASC NULLS LAST)
   WHERE to_label IS TRUE;

都是不同的,而且都不是最佳的。這將非常適合您的查詢:

CREATE INDEX to_label_true_starttime_desc ON public.pumpcards (starttime DESC)
WHERE to_label = true;

引用自:https://dba.stackexchange.com/questions/307333