Postgresql

優化將表的內部連接與其自身進行比較的查詢

  • May 12, 2018

我有一個obj_param_value_log具有以下結構的表:

object_id:   int
param_id:    int
date:        date
value:       real

因此,它儲存每天為與對象相關的不同參數(m2m 關係)擷取的值。對象具有不同的參數集,並且某些對象參數在某些日期的值可能存在差異。

現在,我需要建構一個查詢來查找符合以下範例條件的對象:

param1_value as of two days ago > param2_value as of week low
and sustained within past week

因此,它會查找過去 2 天的 param1 的值大於過去 7 天計算的 param2 的最小值的所有對象,並且該條件今天為真,並且在前 6 天中的每一天都為真。

到目前為止,我能想到的最好的查詢如下:

select opvl_exist.object_id from obj_param_value_log opvl_exist
where (select count(distinct opvl_sust1.date) from obj_param_value_log opvl_sust1 /*count dates from inner join to account for possible gaps in values for either parameter*/
inner join obj_param_value_log opvl_sust2 on opvl_sust1.object_id=opvl_sust2.object_id
where opvl_sust1.param_id=2 /*define param1*/
and   opvl_sust2.param_id=32 /*define param2*/
and   opvl_sust1.date between current_date - integer '2' - integer '6' and current_date - integer '2' /*set a lookup period for param1 from a week and 2 days ago to just 2 days ago*/
and   opvl_sust2.date=(select min(opvl_sust_sq.value) from obj_param_value_log opvl_sust_sq
               where opvl_sust2.object_id=opvl_sust_sq.object_id
               and   opvl_sust_sq.param_id=32
               and   opvl_sust_sq.date between opvl_sust1.date - integer '4' and opvl_sust1.date + integer '2') /*set a relative lookup period to calc the min value for param2 form 4 days before param1 date to 2 days ahead param1 date*/
and   opvl_sust2.date>=current_date - integer '6' - integer '6' /*set a lookup period for param2 from today to a week ago (min value lookup) and another week ago (condition sustained period)*/
and   opvl_exist.object_id=opvl_sust1.object_id) = 
(select count(distinct opvl1.date) from obj_param_value_log opvl1 /*count dates where filter condition is true*/
inner join obj_param_value_log opvl2 on opvl1.object_id=opvl2.object_id
where opvl1.param_id=2 /*define param1*/
and   opvl2.param_id=32 /*define param2*/
and   opvl1.date between current_date - integer '2' - integer '6' and current_date - integer '2' /*set a lookup period for param1 from a week and 2 days ago to just 2 days ago*/
and   opvl2.value=(select min(opvl_sq.value) from obj_param_value_log opvl_sq
               where opvl2.object_id=opvl_sq.object_id
               and   opvl_sq.param_id=32
               and   opvl_sq.date between opvl1.date - integer '4' and opvl1.date + integer '2') /*set a relative lookup period to calc the min value for param2 form 4 days before param1 date to 2 days ahead param1 date*/
and opvl2.date>=current_date - integer '6' - integer '6' /*set a lookup period for param2 from today to a week ago (min value lookup) and another week ago (condition sustained period)*/
and opvl1.value>opvl2.value /*set condition param1 value > param2 value*/
and opvl_exist.object_id=opvl1.object_id)

查詢背後的邏輯是比較兩組日期 - 一組日期的參數值與原始條件匹配,另一組參數及其值滿足條件。如果這些集合相同,則意味著所有相關日期都滿足條件(即查找範圍內具有正確參數和值的日期)。

這似乎找到了我正在尋找的東西,但是存在嚴重的性能問題,因為查詢已經需要大約一分鐘才能在大約 100k 行上執行並且我正在計劃數百萬行。

**尋找查詢優化建議和索引建議。**到目前為止,我在 和 上有單獨object_idparam_id索引date

我仍處於設計階段,所以如果您看到通過重新訪問資料結構來顯著提高性能的方法,我很樂意探索這條路線。

如果您有任何問題或想獲得任何其他詳細資訊以幫助您提出想法,請告訴我。任何輸入表示讚賞!

附加資訊

儘管我對如何大大簡化並據稱提高查詢性能有一個很好的答案,但我想要求更多 DBA 輸入,以查看所描述的業務邏輯是否以最有效的方式實現,並找到更多選項來增加的表現。

PS 我自己的測試表明, ( object_id, param_id, date) 上的複合索引對整體性能有積極影響。

在嘗試不同的實現選項並嘗試各種索引後,我相信我已經找到了一個不錯的解決方案,該解決方案在大約 1000 萬行上證明了它的性能:

select opj_dict.object_id from obj_dict where exists (
select 1 from from obj_param_value_log opvl_exist
where opvl_exist.param_id=2 /*limit the scope of the exist query to param1*/
and opvl_exist.date between current_date - integer '2' - integer '6' and current_date - integer '2' /*limit the scope of the exist query to the lookup period of param1*/
group by opvl_exist.object_id
having count(distinct opvl_exist.date) = (
select count(distinct opvl1.date)
from obj_param_value_log opvl1
inner join obj_param_value_log opvl2 on opvl1.object_id=opvl2.object_id
where opvl1.param_id=2 /*set param1*/
and   opvl2.param_id=32 /*set param2*/
and opvl1.date between current_date - integer '2' - integer '6' and current_date - integer '2' /*set a lookup period for param1 from a week and 2 days ago to just 2 days ago*/
and opvl2.value=(select min(opvl_sq.value) from obj_param_value_log opvl_sq
                       where opvl2.object_id=opvl_sq.object_id
                       and     opvl_sq.param_id=32
                       and     opvl_sq.date between opvl1.date - integer '4' and opvl1.date + integer '2') /*set a relative lookup period to calc the min value for param2 form 4 days before param1 date to 2 days ahead param1 date*/
and     opvl2.date>=current_date - integer '6' - integer '6'  /*set a lookup period for param2 from today to a week ago (min value lookup) and another week ago (condition sustained period)*/
and opvl1.value>opvl2.value
and opvl_exist.object_id=opvl1.object_id)
and obj_dict.object_id=opvl_exist.object_id)

obj_param_value_log在(列順序很重要)上創建以下索引時,這在 1 秒內有效:

  • create index date_objid_paramid on obj_param_value_log (date, object_id, param_id);
  • create index objid_paramid_date_val on obj_param_value_log (object_id, param_id, date, value);

最後一個問題對我來說變得很困難,所以希望我在解決這個問題上的努力能幫助那些發現自己陷入類似困境的人。

這不是對同一邏輯的不同實現的嘗試。這只是簡化(並希望加快)現有查詢的嘗試。

我注意到的是,正在比較的兩個相關子查詢只有一個區別。第二個子查詢在其where子句中有一個額外的條件,即這個:

and opvl1.value>opvl2.value

鑑於這是唯一的區別,可以只count(distinct)使用一個這樣的子查詢來計算這兩個結果。您只需要在 second 內移動該額外條件count(),如下所示:

count(distinct case when opvl1.value>opvl2.value then opvl1.date end)

比較本身會發生在子查詢的select子句中:

select ... /* main query */
where
 (
   select
     count(distinct opvl1.date) =
     count(distinct case when opvl1.value>opvl2.value then opvl1.date end)
   from
                obj_param_value_log opvl1 /*count dates where filter condition is true*/
     inner join obj_param_value_log opvl2 on opvl1.object_id=opvl2.object_id
   where
         opvl1.param_id=2 /*define param1*/
     and opvl2.param_id=32 /*define param2*/
     and opvl1.date between current_date - integer '2' - integer '6' and current_date - integer '2' /*set a lookup period for param1 from a week and 2 days ago to just 2 days ago*/
     and opvl2.value =
     (
       select
         min(opvl_sq.value)
       from
         obj_param_value_log opvl_sq
       where opvl2.object_id=opvl_sq.object_id
         and opvl_sq.param_id=32
         and opvl_sq.date between opvl1.date - integer '4' and opvl1.date + integer '2'
     ) /*set a relative lookup period to calc the min value for param2 form 4 days before param1 date to 2 days ahead param1 date*/
     and opvl2.date>=current_date - integer '6' - integer '6' /*set a lookup period for param2 from today to a week ago (min value lookup) and another week ago (condition sustained period)*/
     and opvl_exist.object_id=opvl1.object_id
 )
;

可能仍有進一步改進的空間,但這可能是一個開始。

引用自:https://dba.stackexchange.com/questions/204648