Postgresql

如何讓 Postgres 多列多表搜尋更高效

  • September 11, 2019

我有一個Shipment表,其中包含一些關於貨物的基本數據和一個ShipmentItem表,其中包含有關該貨物的附加屬性,表的主鍵foreignKey上有一個。toShipment表是關係。Shipment``ShipmentItem``OneToMany

我們需要包含一個文本搜尋選項,該選項採用給定的輸入文本字元串並蒐索 (make)Shipment的 2 個列以及三個特定typesShipmentItemname 列。這是我目前的查詢:

select *
from Shipment shipment
where shipment.deliveryRequestedDate >= '2019-06-09T00:00:00Z'
 and shipment.deliveryRequestedDate <= '2019-12-06T23:59:59Z'
 and (
       shipment.identifierkeyvalues = '12345'
       or shipment.carrierReferenceNumber = '12345'
       or shipment.uuid in (
           select shipmentItem.resultId
           from ShipmentItem shipmentItem
           where (
               shipmentItem.type in (
                                     'poNumber', 'deliveryNoteNumber', 'salesOrderNumber'
               )
           )
           and shipmentItem.name = '12345'
           and shipmentItem.deliveryRequestedDate >= '2019-06-09T00:00:00Z'
           and shipmentItem.deliveryRequestedDate <= '2019-12-06T23:59:59Z'
      )
   )
limit 25

我發現的問題是將子查詢作為or條件之一的組合導致了主要的性能問題(即使子查詢本身通過利用type_name_deliveryRequestedDate該表上的索引快速返回。雖然我們在主表上有多個索引(identifierKeyValues, carrierReferenceNumber, , 甚至是查詢的所有三個 Shipment 列的索引,它只會使用deliveryRequestedDate效率極低的索引,因為這個查詢的範圍太大了。

將其轉換為 JOIN 似乎會導致相同的行為。我只是不太確定目前最好的方法是什麼。我們在此查詢上方有一個 Java Persistence API 層,因此希望盡可能避免對數據模型進行任何重大更改,但不確定最佳方法是什麼。任何想法將不勝感激!

解釋計劃:

Limit  (cost=110.61..209.98 rows=25 width=1370) (actual time=119503.030..124034.809 rows=1 loops=1)
     ->  Index Scan using shipment_deliveryrequesteddate_idx on shipment shipment  (cost=110.61..890840.18 rows=224084 width=1370) (actual time=119503.027..124034.805 rows=1 loops=1)
           Index Cond: ((deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone))
           Filter: ((identifierkeyvalues = '12345'::text) OR (carrierreferencenumber = '12345'::text) OR (hashed SubPlan 1))
           Rows Removed by Filter: 496784
           SubPlan 1
             ->  Index Scan using "type_name_deliveryRequestedDate" on resultitem shipmentitem  (cost=0.56..110.11 rows=24 width=16) (actual time=10.706..16.416 rows=1 loops=1)
                   Index Cond: ((type = ANY ('{poNumber,deliveryNoteNumber,salesOrderNumber}'::text[])) AND (name = '12345'::text) AND (deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone))
   Planning time: 3.175 ms
   Execution time: 124035.006 ms

解釋計劃刪除子查詢——為什麼它使用完全不同的索引?

Limit  (cost=9.51..273.71 rows=6 width=1370) (actual time=0.052..0.053 rows=0 loops=1)
 ->  Bitmap Heap Scan on shipment shipment  (cost=9.51..273.71 rows=6 width=1370) (actual time=0.051..0.051 rows=0 loops=1)
       Recheck Cond: (((identifierkeyvalues = '12345'::text) AND (deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone)) OR (carrierreferencenumber = '12345'::text))
       Filter: ((deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone))
       Rows Removed by Filter: 2
       Heap Blocks: exact=2
       ->  BitmapOr  (cost=9.51..9.51 rows=66 width=0) (actual time=0.041..0.041 rows=0 loops=1)
             ->  Bitmap Index Scan on shipment_identifierkeyvalues_idx  (cost=0.00..4.61 rows=4 width=0) (actual time=0.023..0.024 rows=0 loops=1)
                   Index Cond: ((identifierkeyvalues = '12345'::text) AND (deliveryrequesteddate >= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate <= '2019-12-06 23:59:59'::timestamp without time zone))
             ->  Bitmap Index Scan on shipment_carrierreferencenumber_idx  (cost=0.00..4.90 rows=62 width=0) (actual time=0.016..0.016 rows=2 loops=1)
                   Index Cond: (carrierreferencenumber = '12345'::text)
Planning time: 1.668 ms
Execution time: 0.116 ms

它不能使用 BitmapOr 對不同的表進行掃描(或者至少,它沒有被編碼為能夠做到這一點 - 如果有人投入工作,它可能會這樣做 - 它會必須在另一個表中查找UUID,然後將它們轉換為ipso表上的tid並將它們填充到點陣圖中),因此無法使用BitmapOr計劃。

您最好的選擇可能是將其編寫為兩個不同查詢的 UNION ALL,一個只命中單個表,一個同時命中兩個表。

引用自:https://dba.stackexchange.com/questions/248541