Postgresql

為什麼我的索引 DISTINCT ON 比我的 INNER JOIN 慢得多?

  • May 18, 2020

我有兩張桌子,customerspurchases。每個客戶有很多(數千)次購買。我通常只需要每個客戶的最新購買,這就是為什麼我有latest_purchase_id列並在我添加購買時使用觸發器更新它(請參閱https://dba.stackexchange.com/a/243988/186435)。

我寧願不使用觸發器,所以我嘗試使用DISTINCT ON帶有索引的查詢,但它要慢得多,我不知道為什麼。

customers

      Column        |  Type    |                       Modifiers                        | Storage  | Stats target | Description
---------------------+----------+--------------------------------------------------------+----------+--------------+-------------
id                  | integer  | not null default nextval('customers_id_seq'::regclass) | plain    |              |
latest_purchase_id  | integer  |                                                        | plain    |              |
Indexes:
   "customers_pkey" PRIMARY KEY, btree (id)
   "customers_latest_purchase_id" btree (latest_purchase_id)
Foreign-key constraints:
   "customers_latest_purchase_fk" FOREIGN KEY (latest_purchase_id) REFERENCES purchases(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
   TABLE "purchases" CONSTRAINT "purchases_customer_fk" FOREIGN KEY (customer_id) REFERENCES customers(id) DEFERRABLE INITIALLY DEFERRED
Has OIDs: no

purchases

    Column   |  Type     |                        Modifiers                       | Storage  | Stats target | Description
--------------+-----------+--------------------------------------------------------+----------+--------------+-------------
id           | integer   | not null default nextval('purchases_id_seq'::regclass) | plain    |              |
customer_id  | integer   |                                                        | plain    |              |
Indexes:
   "purchases_pkey" PRIMARY KEY, btree (id)
   "purchases_customer_id_id" btree (customer_id, id)
   "purchases_customer_id" btree (customer_id)
Foreign-key constraints:
   "purchases_customer_fk" FOREIGN KEY (customer_id) REFERENCES customers(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
   TABLE "customers" CONSTRAINT "customers_latest_purchase_id" FOREIGN KEY (latest_purchase_id) REFERENCES purchases(id) DEFERRABLE INITIALLY DEFERRED
Has OIDs: no

DISTINCT ON詢問:

EXPLAIN ANALYZE SELECT DISTINCT ON (customer_id) id, customer_id FROM purchases ORDER BY customer_id DESC, id DESC;
Result  (cost=0.43..162516.37 rows=381 width=8) (actual time=0.050..1478.196 rows=823 loops=1)
  ->  Unique  (cost=0.43..162516.37 rows=381 width=8) (actual time=0.047..1477.754 rows=823 loops=1)
        ->  Index Only Scan Backward using purchases_customer_id_id on purchases  (cost=0.43..157850.96 rows=1866163 width=8) (actual time=0.045..1066.759 rows=1866132 loops=1)
              Heap Fetches: 1363529
Planning Time: 0.096 ms
Execution Time: 1478.408 ms

INNER JOIN查詢基於latest_purchase

EXPLAIN ANALYZE SELECT c.id, p.id FROM customers c JOIN purchases p ON c.latest_purchase = p.id;
Nested Loop  (cost=0.43..43877.27 rows=7594 width=8) (actual time=0.508..112.665 rows=755 loops=1)
  ->  Seq Scan on customers d  (cost=0.00..213.94 rows=7594 width=8) (actual time=0.006..2.861 rows=7594 loops=1)
  ->  Index Only Scan using customers_purchase_pkey on purchases p  (cost=0.43..5.75 rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=7594)
        Index Cond: (id = c.latest_purchase)
        Heap Fetches: 583
Planning Time: 1.032 ms
Execution Time: 112.861 ms

這是答案:

每個客戶有很多(數千)次購買。

DISTINCT ON對於每個客戶的少量購買來說速度很快。看:

這應該更快

SELECT c.id AS customer_id, p.id AS purchase_id
FROM   customers c
LEFT   JOIN LATERAL (
  SELECT p.id
  FROM   purchases p
  WHERE  p.customer_id = c.id
  ORDER  BY p.id DESC
  LIMIT  1
  ) p ON true;

細微的差別:每個客戶都在結果中,即使根本沒有購買。

你的索引"purchases_customer_id_id" btree (customer_id, id)很適合這個。上的索引(customer_id, id DESC)會更好一些。

看:

除了1:

第1個圖顯示rows=823,第2個rows=755。建議您purchases.customer_id在 table 中沒有匹配項customers,這通常不應該。添加 FK 約束 from purchases.customer_idtocustomers.id和 makepurchases.customer_id NOT NULL以強制引用完整性。

除了2:

很多Heap Fetches在每個查詢計劃的末尾。你吸塵夠嗎。看:

引用自:https://dba.stackexchange.com/questions/267411