Postgresql
為什麼我的索引 DISTINCT ON 比我的 INNER JOIN 慢得多?
我有兩張桌子,
customers
和purchases
。每個客戶有很多(數千)次購買。我通常只需要每個客戶的最新購買,這就是為什麼我有latest_purchase_id
列並在我添加購買時使用觸發器更新它(請參閱https://dba.stackexchange.com/a/243988/186435)。我寧願不使用觸發器,所以我嘗試使用
DISTINCT ON
帶有索引的查詢,但它要慢得多,我不知道為什麼。表
customers
:Column | Type | Modifiers | Storage | Stats target | Description ---------------------+----------+--------------------------------------------------------+----------+--------------+------------- id | integer | not null default nextval('customers_id_seq'::regclass) | plain | | latest_purchase_id | integer | | plain | | Indexes: "customers_pkey" PRIMARY KEY, btree (id) "customers_latest_purchase_id" btree (latest_purchase_id) Foreign-key constraints: "customers_latest_purchase_fk" FOREIGN KEY (latest_purchase_id) REFERENCES purchases(id) DEFERRABLE INITIALLY DEFERRED Referenced by: TABLE "purchases" CONSTRAINT "purchases_customer_fk" FOREIGN KEY (customer_id) REFERENCES customers(id) DEFERRABLE INITIALLY DEFERRED Has OIDs: no
表
purchases
:Column | Type | Modifiers | Storage | Stats target | Description --------------+-----------+--------------------------------------------------------+----------+--------------+------------- id | integer | not null default nextval('purchases_id_seq'::regclass) | plain | | customer_id | integer | | plain | | Indexes: "purchases_pkey" PRIMARY KEY, btree (id) "purchases_customer_id_id" btree (customer_id, id) "purchases_customer_id" btree (customer_id) Foreign-key constraints: "purchases_customer_fk" FOREIGN KEY (customer_id) REFERENCES customers(id) DEFERRABLE INITIALLY DEFERRED Referenced by: TABLE "customers" CONSTRAINT "customers_latest_purchase_id" FOREIGN KEY (latest_purchase_id) REFERENCES purchases(id) DEFERRABLE INITIALLY DEFERRED Has OIDs: no
DISTINCT ON
詢問:EXPLAIN ANALYZE SELECT DISTINCT ON (customer_id) id, customer_id FROM purchases ORDER BY customer_id DESC, id DESC; Result (cost=0.43..162516.37 rows=381 width=8) (actual time=0.050..1478.196 rows=823 loops=1) -> Unique (cost=0.43..162516.37 rows=381 width=8) (actual time=0.047..1477.754 rows=823 loops=1) -> Index Only Scan Backward using purchases_customer_id_id on purchases (cost=0.43..157850.96 rows=1866163 width=8) (actual time=0.045..1066.759 rows=1866132 loops=1) Heap Fetches: 1363529 Planning Time: 0.096 ms Execution Time: 1478.408 ms
INNER JOIN
查詢基於latest_purchase
:EXPLAIN ANALYZE SELECT c.id, p.id FROM customers c JOIN purchases p ON c.latest_purchase = p.id; Nested Loop (cost=0.43..43877.27 rows=7594 width=8) (actual time=0.508..112.665 rows=755 loops=1) -> Seq Scan on customers d (cost=0.00..213.94 rows=7594 width=8) (actual time=0.006..2.861 rows=7594 loops=1) -> Index Only Scan using customers_purchase_pkey on purchases p (cost=0.43..5.75 rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=7594) Index Cond: (id = c.latest_purchase) Heap Fetches: 583 Planning Time: 1.032 ms Execution Time: 112.861 ms
這是答案:
每個客戶有很多(數千)次購買。
DISTINCT ON
對於每個客戶的少量購買來說速度很快。看:這應該更快:
SELECT c.id AS customer_id, p.id AS purchase_id FROM customers c LEFT JOIN LATERAL ( SELECT p.id FROM purchases p WHERE p.customer_id = c.id ORDER BY p.id DESC LIMIT 1 ) p ON true;
細微的差別:每個客戶都在結果中,即使根本沒有購買。
你的索引
"purchases_customer_id_id" btree (customer_id, id)
很適合這個。上的索引(customer_id, id DESC)
會更好一些。看:
除了1:
第1個圖顯示
rows=823
,第2個rows=755
。建議您purchases.customer_id
在 table 中沒有匹配項customers
,這通常不應該。添加 FK 約束 frompurchases.customer_id
tocustomers.id
和 makepurchases.customer_id NOT NULL
以強制引用完整性。除了2:
很多
Heap Fetches
在每個查詢計劃的末尾。你吸塵夠嗎。看: