Postgresql

僅在 MacOS 上的 postgresql 10.5 慢查詢問題

  • September 5, 2018

考慮這 3 個表

成員

CREATE TABLE public.members
(
   id integer NOT NULL DEFAULT nextval('members_id_seq'::regclass),
   client_id integer,
   login character varying(255) COLLATE pg_catalog."default" NOT NULL,
   password character varying(255) COLLATE pg_catalog."default" NOT NULL,
   CONSTRAINT members_pkey PRIMARY KEY (id),
   CONSTRAINT fk_45a0d2ff19eb6921 FOREIGN KEY (client_id)
       REFERENCES public.clients (id) MATCH SIMPLE
       ON UPDATE NO ACTION
       ON DELETE CASCADE
)
WITH (
   OIDS = FALSE
)
TABLESPACE pg_default;

CREATE INDEX idx_45a0d2ff19eb6921
   ON public.members USING btree
   (client_id)
TABLESPACE pg_default;

客戶

CREATE TABLE public.clients
(
   id integer NOT NULL DEFAULT nextval('clients_id_seq'::regclass),
   name character varying(255) COLLATE pg_catalog."default" NOT NULL,
   CONSTRAINT clients_pkey PRIMARY KEY (id)
WITH (
   OIDS = FALSE
)
TABLESPACE pg_default;

發票

CREATE TABLE public.invoices
(
   id integer NOT NULL DEFAULT nextval('invoices_id_seq'::regclass),
   client_id integer,
   member_id integer,
   CONSTRAINT invoices_pkey PRIMARY KEY (id),
   CONSTRAINT fk_6a2f2f9519eb6921 FOREIGN KEY (client_id)
       REFERENCES public.clients (id) MATCH SIMPLE
       ON UPDATE NO ACTION
       ON DELETE SET NULL,
   CONSTRAINT fk_6a2f2f957597d3fe FOREIGN KEY (member_id)
       REFERENCES public.members (id) MATCH SIMPLE
       ON UPDATE NO ACTION
       ON DELETE SET NULL
)
WITH (
   OIDS = FALSE
)
TABLESPACE pg_default;

CREATE INDEX idx_6a2f2f9519eb6921
   ON public.invoices USING btree
   (client_id)
   TABLESPACE pg_default;

CREATE INDEX idx_6a2f2f957597d3fe
   ON public.invoices USING btree
   (member_id)
   TABLESPACE pg_default;

我嘗試在幾個環境中執行下面的簡單查詢

SELECT i.*, m.id AS member_id FROM invoices i 
LEFT JOIN members m ON m.id = i.member_id 
WHERE i.member_id is null;

在 DEBIAN 7 - 8 上,postgresql 9.6 > 效果很好

在 DEBIAN 7 - 8 上,postgresql 10.5 > 效果很好

在 MacOS 10.13 上,postgresql 9.6 > 執行良好

在 MacOS 10.13 上,postgresql 10.1 > 執行時間為 4min

在 MacOS 10.13 上,postgresql 10.5 (dev env) > 執行時間為 4 分鐘

我不明白為什麼這個簡單的查詢會在我的開發機器上花費大量時間。有 ~140000 張發票,此查詢應顯示 ~13000 條記錄。

編輯

如所問,這是我的開發環境(Mac os - postgresql 10.5)上 EXPLAIN(analyze, buffers) 的結果:

Nested Loop Left Join  (cost=9.41..298.20 rows=81 width=1177) (actual time=117.131..366691.852 rows=13429 loops=1)
  Join Filter: (m.id = i.member_id)
  Rows Removed by Join Filter: 1844365718
  Buffers: shared hit=127505, temp read=3169008 written=235
  ->  Bitmap Heap Scan on invoices i  (cost=9.04..288.60 rows=81 width=1173) (actual time=1.178..13.076 rows=13429 loops=1)
        Recheck Cond: (member_id IS NULL)
        Heap Blocks: exact=651
        Buffers: shared hit=691
        ->  Bitmap Index Scan on idx_6a2f2f957597d3fe  (cost=0.00..9.02 rows=81 width=0) (actual time=1.054..1.054 rows=13429 loops=1)
              Index Cond: (member_id IS NULL)
              Buffers: shared hit=40
  ->  Materialize  (cost=0.38..8.39 rows=1 width=4) (actual time=0.005..15.093 rows=137342 loops=13429)
        Buffers: shared hit=126814, temp read=3169008 written=235
        ->  Index Only Scan using members_pkey on members m  (cost=0.38..8.39 rows=1 width=4) (actual time=0.082..67.872 rows=137342 loops=1)
              Heap Fetches: 137342
              Buffers: shared hit=126814
Planning time: 0.341 ms
Execution time: 366695.678 ms

在 Debian 8 - postgresql 10.5 上也是如此:

rows=13774 loops=1)
  Hash Cond: (i.member_id = m.id)
  Buffers: shared hit=1089 read=4461, temp written=202
  ->  Bitmap Heap Scan on invoices i  (cost=258.39..3122.13 rows=13674 width=732) (actual time=3.299..5.702 rows=13774 loops=1)
        Recheck Cond: (member_id IS NULL)
        Heap Blocks: exact=661
        Buffers: shared hit=702
        ->  Bitmap Index Scan on idx_6a2f2f957597d3fe  (cost=0.00..254.97 rows=13674 width=0) (actual time=2.928..2.928 rows=13774 loops=1)
              Index Cond: (member_id IS NULL)
              Buffers: shared hit=41
  ->  Hash  (cost=6219.50..6219.50 rows=137450 width=4) (actual time=113.745..113.745 rows=137450 loops=1)
        Buckets: 131072  Batches: 2  Memory Usage: 3447kB
        Buffers: shared hit=384 read=4461, temp written=201
        ->  Seq Scan on members m  (cost=0.00..6219.50 rows=137450 width=4) (actual time=0.056..54.524 rows=137450 loops=1)
              Buffers: shared hit=384 read=4461
Planning time: 3.456 ms
Execution time: 126.951 ms

任何想法將不勝感激

執行計劃顯示,對 mac 上的行數的估計完全偏離了。

如果優化器對您的數據有錯誤的統計資訊,它將選擇錯誤的執行計劃。

從只有 1 行的表中檢索數據的執行計劃與從具有一百萬行的表中檢索數據的執行計劃有很大不同。

低估執行計劃中一個步驟返回的行數將導致執行計劃效率低下。

在您的 Mac 安裝中,優化器認為成員表僅包含一行,因此優化器選擇僅對一行有效的計劃。但實際上該表包含 137342 行。

當統計資訊正確時,對於那麼多行,選擇的“嵌套循環”比優化器選擇的“雜湊連接”要慢得多。

該手冊更詳細地解釋了這一點。

另一個了解執行計劃的好地方是https://use-the-index-luke.com/

引用自:https://dba.stackexchange.com/questions/216765