如何讓 Postgres 多列多表搜尋更高效
我有一個
Shipment
表,其中包含一些關於貨物的基本數據和一個ShipmentItem
表,其中包含有關該貨物的附加屬性,表的主鍵foreignKey
上有一個。toShipment
表是關係。Shipment``ShipmentItem``OneToMany
我們需要包含一個文本搜尋選項,該選項採用給定的輸入文本字元串並蒐索 (make)
Shipment
的 2 個列以及三個特定types
的ShipmentItem
name 列。這是我目前的查詢:select * from Shipment shipment where shipment.deliveryRequestedDate >= '2019-06-09T00:00:00Z' and shipment.deliveryRequestedDate <= '2019-12-06T23:59:59Z' and ( shipment.identifierkeyvalues = '12345' or shipment.carrierReferenceNumber = '12345' or shipment.uuid in ( select shipmentItem.resultId from ShipmentItem shipmentItem where ( shipmentItem.type in ( 'poNumber', 'deliveryNoteNumber', 'salesOrderNumber' ) ) and shipmentItem.name = '12345' and shipmentItem.deliveryRequestedDate >= '2019-06-09T00:00:00Z' and shipmentItem.deliveryRequestedDate <= '2019-12-06T23:59:59Z' ) ) limit 25
我發現的問題是將子查詢作為
or
條件之一的組合導致了主要的性能問題(即使子查詢本身通過利用type_name_deliveryRequestedDate
該表上的索引快速返回。雖然我們在主表上有多個索引(identifierKeyValues
,carrierReferenceNumber
, , 甚至是查詢的所有三個 Shipment 列的索引,它只會使用deliveryRequestedDate
效率極低的索引,因為這個查詢的範圍太大了。將其轉換為 JOIN 似乎會導致相同的行為。我只是不太確定目前最好的方法是什麼。我們在此查詢上方有一個 Java Persistence API 層,因此希望盡可能避免對數據模型進行任何重大更改,但不確定最佳方法是什麼。任何想法將不勝感激!
解釋計劃:
Limit (cost=110.61..209.98 rows=25 width=1370) (actual time=119503.030..124034.809 rows=1 loops=1) -> Index Scan using shipment_deliveryrequesteddate_idx on shipment shipment (cost=110.61..890840.18 rows=224084 width=1370) (actual time=119503.027..124034.805 rows=1 loops=1) Index Cond: ((deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone)) Filter: ((identifierkeyvalues = '12345'::text) OR (carrierreferencenumber = '12345'::text) OR (hashed SubPlan 1)) Rows Removed by Filter: 496784 SubPlan 1 -> Index Scan using "type_name_deliveryRequestedDate" on resultitem shipmentitem (cost=0.56..110.11 rows=24 width=16) (actual time=10.706..16.416 rows=1 loops=1) Index Cond: ((type = ANY ('{poNumber,deliveryNoteNumber,salesOrderNumber}'::text[])) AND (name = '12345'::text) AND (deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone)) Planning time: 3.175 ms Execution time: 124035.006 ms
解釋計劃刪除子查詢——為什麼它使用完全不同的索引?
Limit (cost=9.51..273.71 rows=6 width=1370) (actual time=0.052..0.053 rows=0 loops=1) -> Bitmap Heap Scan on shipment shipment (cost=9.51..273.71 rows=6 width=1370) (actual time=0.051..0.051 rows=0 loops=1) Recheck Cond: (((identifierkeyvalues = '12345'::text) AND (deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone)) OR (carrierreferencenumber = '12345'::text)) Filter: ((deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone)) Rows Removed by Filter: 2 Heap Blocks: exact=2 -> BitmapOr (cost=9.51..9.51 rows=66 width=0) (actual time=0.041..0.041 rows=0 loops=1) -> Bitmap Index Scan on shipment_identifierkeyvalues_idx (cost=0.00..4.61 rows=4 width=0) (actual time=0.023..0.024 rows=0 loops=1) Index Cond: ((identifierkeyvalues = '12345'::text) AND (deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone)) -> Bitmap Index Scan on shipment_carrierreferencenumber_idx (cost=0.00..4.90 rows=62 width=0) (actual time=0.016..0.016 rows=2 loops=1) Index Cond: (carrierreferencenumber = '12345'::text) Planning time: 1.668 ms Execution time: 0.116 ms
它不能使用 BitmapOr 對不同的表進行掃描(或者至少,它沒有被編碼為能夠做到這一點 - 如果有人投入工作,它可能會這樣做 - 它會必須在另一個表中查找UUID,然後將它們轉換為ipso表上的tid並將它們填充到點陣圖中),因此無法使用BitmapOr計劃。
您最好的選擇可能是將其編寫為兩個不同查詢的 UNION ALL,一個只命中單個表,一個同時命中兩個表。