如何讓 Postgres 多列多表搜尋更高效

September 11, 2019

我有一個Shipment表，其中包含一些關於貨物的基本數據和一個ShipmentItem表，其中包含有關該貨物的附加屬性，表的主鍵foreignKey上有一個。toShipment表是關係。Shipment``ShipmentItem``OneToMany

我們需要包含一個文本搜尋選項，該選項採用給定的輸入文本字元串並蒐索 (make)Shipment的 2 個列以及三個特定types的ShipmentItemname 列。這是我目前的查詢：

select *
from Shipment shipment
where shipment.deliveryRequestedDate &gt;= '2019-06-09T00:00:00Z'
 and shipment.deliveryRequestedDate &lt;= '2019-12-06T23:59:59Z'
 and (
       shipment.identifierkeyvalues = '12345'
       or shipment.carrierReferenceNumber = '12345'
       or shipment.uuid in (
           select shipmentItem.resultId
           from ShipmentItem shipmentItem
           where (
               shipmentItem.type in (
                                     'poNumber', 'deliveryNoteNumber', 'salesOrderNumber'
               )
           )
           and shipmentItem.name = '12345'
           and shipmentItem.deliveryRequestedDate &gt;= '2019-06-09T00:00:00Z'
           and shipmentItem.deliveryRequestedDate &lt;= '2019-12-06T23:59:59Z'
      )
   )
limit 25

我發現的問題是將子查詢作為or條件之一的組合導致了主要的性能問題（即使子查詢本身通過利用type_name_deliveryRequestedDate該表上的索引快速返回。雖然我們在主表上有多個索引（identifierKeyValues, carrierReferenceNumber, , 甚至是查詢的所有三個 Shipment 列的索引，它只會使用deliveryRequestedDate效率極低的索引，因為這個查詢的範圍太大了。

將其轉換為 JOIN 似乎會導致相同的行為。我只是不太確定目前最好的方法是什麼。我們在此查詢上方有一個 Java Persistence API 層，因此希望盡可能避免對數據模型進行任何重大更改，但不確定最佳方法是什麼。任何想法將不勝感激！

解釋計劃：

Limit  (cost=110.61..209.98 rows=25 width=1370) (actual time=119503.030..124034.809 rows=1 loops=1)
     -&gt;  Index Scan using shipment_deliveryrequesteddate_idx on shipment shipment  (cost=110.61..890840.18 rows=224084 width=1370) (actual time=119503.027..124034.805 rows=1 loops=1)
           Index Cond: ((deliveryrequesteddate &gt;= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate &lt;= '2019-12-06 23:59:59'::timestamp without time zone))
           Filter: ((identifierkeyvalues = '12345'::text) OR (carrierreferencenumber = '12345'::text) OR (hashed SubPlan 1))
           Rows Removed by Filter: 496784
           SubPlan 1
             -&gt;  Index Scan using "type_name_deliveryRequestedDate" on resultitem shipmentitem  (cost=0.56..110.11 rows=24 width=16) (actual time=10.706..16.416 rows=1 loops=1)
                   Index Cond: ((type = ANY ('{poNumber,deliveryNoteNumber,salesOrderNumber}'::text[])) AND (name = '12345'::text) AND (deliveryrequesteddate &gt;= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate &lt;= '2019-12-06 23:59:59'::timestamp without time zone))
   Planning time: 3.175 ms
   Execution time: 124035.006 ms

解釋計劃刪除子查詢——為什麼它使用完全不同的索引？

Limit  (cost=9.51..273.71 rows=6 width=1370) (actual time=0.052..0.053 rows=0 loops=1)
 -&gt;  Bitmap Heap Scan on shipment shipment  (cost=9.51..273.71 rows=6 width=1370) (actual time=0.051..0.051 rows=0 loops=1)
       Recheck Cond: (((identifierkeyvalues = '12345'::text) AND (deliveryrequesteddate &gt;= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate &lt;= '2019-12-06 23:59:59'::timestamp without time zone)) OR (carrierreferencenumber = '12345'::text))
       Filter: ((deliveryrequesteddate &gt;= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate &lt;= '2019-12-06 23:59:59'::timestamp without time zone))
       Rows Removed by Filter: 2
       Heap Blocks: exact=2
       -&gt;  BitmapOr  (cost=9.51..9.51 rows=66 width=0) (actual time=0.041..0.041 rows=0 loops=1)
             -&gt;  Bitmap Index Scan on shipment_identifierkeyvalues_idx  (cost=0.00..4.61 rows=4 width=0) (actual time=0.023..0.024 rows=0 loops=1)
                   Index Cond: ((identifierkeyvalues = '12345'::text) AND (deliveryrequesteddate &gt;= '2019-06-09 00:00:00'::timestamp without time zone) AND (deliveryrequesteddate &lt;= '2019-12-06 23:59:59'::timestamp without time zone))
             -&gt;  Bitmap Index Scan on shipment_carrierreferencenumber_idx  (cost=0.00..4.90 rows=62 width=0) (actual time=0.016..0.016 rows=2 loops=1)
                   Index Cond: (carrierreferencenumber = '12345'::text)
Planning time: 1.668 ms
Execution time: 0.116 ms

它不能使用 BitmapOr 對不同的表進行掃描（或者至少，它沒有被編碼為能夠做到這一點 - 如果有人投入工作，它可能會這樣做 - 它會必須在另一個表中查找UUID，然後將它們轉換為ipso表上的tid並將它們填充到點陣圖中），因此無法使用BitmapOr計劃。
您最好的選擇可能是將其編寫為兩個不同查詢的 UNION ALL，一個只命中單個表，一個同時命中兩個表。

引用自：https://dba.stackexchange.com/questions/248541

如何讓 Postgres 多列多表搜尋更高效

相關問答

使用 GIN 索引位串

大表的高效分頁

複合索引是否也適用於第一個欄位的查詢？

如何加快查詢時間序列中的最後一個值？

具有最新時間戳的行

索引：如果節點數相同，則整數與字元串的性能