Postgresql

將單獨的列組合成最小的相關範圍

  • April 24, 2019

我正在嘗試根據最小的可能連續範圍丟棄多個可能重疊或不重疊的記錄。我想做類似於This的事情,但是范圍是單獨列上的數字字元串,並且我在同一個查詢中還有 4 個欄位,我只需要獲取具有最小範圍的記錄

具有簡化欄位的數據

   create table invoices(
   eventname varchar,
   /*...many fields*/
   quantity varchar,
   section varchar,
   rownumber varchar,
   secondrow varchar,
   lowseat varchar,
   highseat varchar,
   /*...some more fields*/
   status varchar,
   /*...even more fields*/
   created_at timestamp default now() not null,
   updated_at timestamp
);

INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '2', '227', '15', null, '9', '10', 'DEPLETED' ,  '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '2', '227', '15', null, '7', '8', 'DEPLETED'  ,  '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '2', '227', '14', null, '23', '24', 'DEPLETED',  '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '1', '227', '13', null, '21', '21', 'DEPLETED',  '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '8', '227', '14', null, '15', '22', 'DEPLETED',  '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '1', '227', '14', null, '1', '1', 'DEPLETED',    '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '2', 'A57', 'GA', null, '1', '2', 'DEPLETED',    '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 4 (Miami vs North Carolina and Duke vs Notre Dame)', '3', 'A57', 'GA', null, '3', '5', 'DEPLETED',    '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('2018 ACC Basketball Tournament - Session 5 (Virginia vs. Clemson and Duke vs. North Carolina)', '3', '228', '14', null, '1', '3', 'DEPLETED', '2019-02-06 00:46:13.286828', null);
INSERT INTO public.invoices (eventname, quantity, section, rownumber, secondrow, lowseat, highseat, status, created_at, updated_at) VALUES ('Penn State Nittany Lions at Pittsburgh Panthers', '2', '227', 'K', null, '25', '26', 'DEPLETED', '2019-02-06 00:46:13.286828', null);

視覺表現:

第 1 組

1 | =====
2 |   ===  --> take this record with all its values

第 2 組

3 |    === --> take this record

第 3 組

4 |       =======
5 |           ==  --> take this record
6 |         =====
  • 應該合併相鄰的範圍。
  • 包容性的下限和上限確實最適合座位號。

我做了以下操作,它為所有內容返回相同的值,所以我知道它不對

SELECT distinct section, rownumber,
min(COALESCE(lowseat, '')) over 
(partition by grp) as lowseat,
max(maxhighseat) over (partition by grp) AS highseat
FROM  (
SELECT *, count(nextstart > maxhighseat OR NULL) OVER (PARTITION BY section,
rownumber ORDER BY lowseat desc, highseat desc NULLS LAST) AS grp
FROM  (
 SELECT section, rownumber, lowseat, highseat, max(COALESCE(highseat, '')) OVER (PARTITION BY section, rownumber ORDER BY lowseat, highseat) AS maxhighseat
      , lead(lowseat) OVER (PARTITION BY section, rownumber ORDER BY lowseat, highseat) As nextstart
 FROM invoices where status <> 'DEPLETED' and eventname like 'UCLA%'
 ) a
) b
ORDER  BY 1;

表重要欄位如下所示:

id | section | row | lowseat | highseat | created_at
----+---------------------------------------------------------------
 1 |      14 |  18 |       1 |       15 | 2019-01-01T00:00:00.000Z
 2 |      14 |  18 |       4 |       15 | 2019-01-01T00:00:00.000Z
 3 |      12 |  13 |       2 |       13 | 2019-02-01T00:00:00.000Z
 4 |      14 |  18 |       4 |       12 | 2019-01-01T00:00:00.000Z

這是一個經典的差距和孤島問題。這個問題本身還有很多空白,沒有雙關語的意思。填寫一些…

假設

  • lowseat並且highseat似乎是您範圍的下限和上限,顯然是**integer**數字,但儲存為varchar. 更改它,或者您必須將類型轉換添加到我的以下查詢中。
  • 您沒有定義是否應合併或分離**相鄰範圍。**假設是分開的,因為它們不是嚴格“重疊”的。
  • 假設下限和上限是包容性的,最適合座位號。
  • 忽略不符合範例數據的查詢謂詞。

詢問

SELECT DISTINCT ON (island) *
FROM  (
  SELECT *
       , highseat - lowseat AS len -- off by 1, but irrelevant
       , count(gap) OVER (ORDER BY rn) AS island
  FROM  (
     SELECT *
          , (lowseat > max(highseat) OVER w) OR NULL AS gap
          , row_number() OVER w AS rn
     FROM   invoices
     WINDOW w AS (ORDER BY lowseat, highseat DESC  -- longest range 1st
                  ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
     ) sub1
  ) sub2
ORDER  BY island, len, lowseat;   -- break ties by picking smallest numbers

db<>在這裡擺弄

這是基於lowseathighseat,該行的其餘部分只是鎮流器。

具有更多解釋和替代程序實現的相關答案:

關於DISTINCT ON

引用自:https://dba.stackexchange.com/questions/228828