Postgresql

對多個 OUTER JOIN 進行計數和分組

  • March 15, 2021

我正在創建廣告日誌視圖(展示次數、點擊次數和每次展示的點擊次數)。我有一個簡單的表結構和一些工作查詢,但是我在將它們組合成一個可以用作視圖(不是物化視圖,因為這將是實時數據)的單個查詢時遇到了一些麻煩。

這些表是:

CREATE TABLE advert
(
 id integer NOT NULL PRIMARY KEY
);

CREATE TABLE advert_event
(
 code CHAR(1) NOT NULL PRIMARY KEY
);

CREATE TABLE advert_log
(
 advertisement integer NOT NULL REFERENCES advert(id),
 event_code CHAR(1) NOT NULL REFERENCES advert_event(code)
);

以及一些涵蓋所有可能情況的範例數據:

INSERT INTO advert VALUES (1);
INSERT INTO advert VALUES (2);
INSERT INTO advert VALUES (3);
INSERT INTO advert VALUES (4);

INSERT INTO advert_event VALUES ('I'); -- Impression
INSERT INTO advert_event VALUES ('C'); -- Click

INSERT INTO advert_log VALUES (1, 'I');
INSERT INTO advert_log VALUES (1, 'C');
INSERT INTO advert_log VALUES (2, 'I');
INSERT INTO advert_log VALUES (2, 'I');
INSERT INTO advert_log VALUES (2, 'C');
INSERT INTO advert_log VALUES (3, 'I');
INSERT INTO advert_log VALUES (3, 'I');

作為參考,這裡有一組我想要計算的東西advert_log

查詢 A。

SELECT * FROM advert,advert_event;

結果 A。

id | code
----+------
 1 | I
 1 | C
 2 | I
 2 | C
 3 | I
 3 | C
 4 | I
 4 | C
(8 rows)

廣告事件計數:

查詢 B。

SELECT DISTINCT advertisement,event_code,COUNT(*) OVER (PARTITION BY advertisement,event_code) FROM advert_log;

結果 B。

advertisement | event_code | count
---------------+------------+-------
            1 | I          |     1
            1 | C          |     1
            2 | I          |     2
            2 | C          |     1
            3 | I          |     1
(5 rows)

對於任何單個廣告,可以通過以下查詢獲得正確的計數:

查詢 C1。

SELECT COUNT(*) FROM advert_log WHERE advertisement=4 AND event_code='I';
count
-------
    0
(1 row)

查詢 C2。

SELECT COUNT(*) FROM advert_log WHERE advertisement=4 AND event_code='C';
count
-------
    0
(1 row)

當然,我之前的查詢不包括零計數,因此它沒有捕捉到上述兩種情況中的任何一種。

最終,我試圖做的是將上述數字轉換為以下數字,使用clicks(“C”條目)除以impressions(“I”條目)得出cpi列:

advertisement | impressions | clicks | cpi
---------------+-------------+--------+-----
            1 |           1 |     1  | 1.0
            2 |           2 |     1  | 0.5
            3 |           1 |     0  | 0.0
            4 |           0 |     0  | 0.0 <- or NULL, NaN, 1.0, ...

我最初的方法是為查詢 C1 和 C2 創建一個視圖,並從基於查詢 A 的視圖中呼叫該函式。

我懷疑有一種更簡單的方法可以通過單個查詢來實現我的目標。

在撰寫問題時,我能夠找到解決方案,但我決定發布問題和答案,以防將來對其他人有所幫助。或者,如果有人對此問題有更簡單或性能更好的解決方案,我會很高興聽到。

我設法通過在 OUTER JOIN 之前使用 CROSS JOIN 來解決我的 NULL 計數問題:

SELECT * FROM advert CROSS JOIN advert_event LEFT OUTER JOIN advert_log ON advert_log.advertisement=advert.id AND advert_log.event_code=advert_event.code;

id | code | advertisement | event_code
----+------+---------------+------------
 1 | I    |             1 | I
 1 | C    |             1 | C
 2 | I    |             2 | I
 2 | I    |             2 | I
 2 | C    |             2 | C
 3 | I    |             3 | I
 3 | I    |             3 | I
 3 | C    |               |
 4 | I    |               |
 4 | C    |               |
(10 rows)

上面給了我需要的中間表。添加分組和計數,我終於得到了我正在尋找的數字:

SELECT advert.id,advert_event.code,COUNT(advert_log.advertisement) FROM advert CROSS JOIN advert_event LEFT OUTER JOIN advert_log ON advert_log.advertisement=advert.id AND advert_log.event_code=advert_event.code GROUP BY advert.id,advert_event.code;

id | code | count
----+------+-------
 1 | C    |     1
 1 | I    |     1
 2 | C    |     1
 2 | I    |     2
 3 | C    |     0
 3 | I    |     2
 4 | C    |     0
 4 | I    |     0
(8 rows)

最後,使用兩個子選擇(一個用於“I”,一個用於“C”),我編寫了一個查詢以通過廣告獲取計數:

CREATE VIEW advertisement_dashboard AS
SELECT a.id,i.impressions,c.clicks,c.clicks::float/greatest(i.impressions, 1) AS cpi FROM advert a,
(
 SELECT advert.id,COUNT(advert_log.advertisement) AS impressions FROM advert CROSS JOIN advert_event LEFT OUTER JOIN advert_log ON advert_log.advertisement=advert.id AND advert_log.event_code=advert_event.code
GROUP BY advert.id,advert_event.code HAVING code='I'
) i,
(
 SELECT advert.id,COUNT(advert_log.advertisement) AS clicks FROM advert CROSS JOIN advert_event LEFT OUTER JOIN advert_log ON advert_log.advertisement=advert.id AND advert_log.event_code=advert_event.code
GROUP BY advert.id,advert_event.code HAVING code='C'
) c WHERE i.id=a.id AND c.id=a.id ORDER BY a.id ASC;

SELECT * FROM advertisement_dashboard;

id | impressions | clicks | cpi
----+-------------+--------+-----
 1 |           1 |      1 |   1
 2 |           2 |      1 | 0.5
 3 |           2 |      0 |   0
 4 |           0 |      0 |   0
(4 rows)

引用自:https://dba.stackexchange.com/questions/224127