如何將這個 FULL OUTER JOIN 中的空結果與非空響應分組？

March 17, 2017

我正在使用完全連接查看來自兩個不同選擇語句的使用者 ID 的重疊和非重疊（唯一值）。主要區別在於一個表的 deal_id = 0，而另一個表的 deal_id 大於或等於 1。我加入了關於 exchange_id、pub_id 和 user_id 的 select 語句，但沒有加入 deal_id。
欄位增量值是嘗試計算存在於池中的使用者 deal_id >= 1 並且不存在於池中的 deal_id = 0（完全加入的主要原因）。
這是我輸入的查詢的簡化，它在兩個選擇語句中：
SET
hive.auto.convert.join = TRUE
;

SELECT
   First.deal_id
   ,COALESCE( First.exchange_id, Second.exchange_id ) as exchange_id
   ,COALESCE( First.pub_id, Second.pub_id ) as pub_id
   ,COUNT (DISTINCT(case when Second.user_id is null then First.user_id else null END)) AS Incremental
   ,SUM (First.imps) AS First_imps
   ,SUM (Second.imps) AS Second_imps
   FROM
       (
           SELECT
               a.deal_id
               ,a.exchange_id
               ,a.pub_id
               ,a.user_id
               ,1 AS imps
           FROM
               logs a 
           WHERE
               a.deal_id &gt;= 1
           AND a.event_type = 'TRUE'
       ) First 
       FULL JOIN (
           SELECT
               a.exchange_id
               ,a.pub_id
               ,a.user_id
               ,1 AS imps
           FROM
               logs a
           WHERE
           a.deal_id = 0
           AND a.event_type = 'TRUE'
       ) Second
       ON (
           First.exchange_id = Second.exchange_id
           AND First.pub_id = Second.pub_id
           AND First.user_id = Second.user_id
       )
       GROUP BY
       First.deal_id
       ,COALESCE( First.exchange_id, Second.exchange_id )
       ,COALESCE( First.pub_id, Second.pub_id )
;
這是我看到的結果：
DEAL_ID    EXCHANGE_ID    PUB_ID    INCREMENTAL    FIRST_IMPS    SECOND_IMPS
/N         4              1780      0              0             15
/N         4              1560      0              0             32
3389       4              1780      2              7             6
1534       4              1560      4              9             8
這是我想看到的：
DEAL_ID    EXCHANGE_ID    PUB_ID    INCREMENTAL    FIRST_IMPS    SECOND_IMPS
3389       4              1780      2              7             21
1534       4              1560      4              9             40
交易 ID 為空的結果與基於 exchange_id 和 pub_id 的非空交易 ID 的結果相匹配。
我能做些什麼？
與此問題類似，但此解決方案不適用於此問題。
注意：我已經在 stackoverflow 上發布了這個問題，但我想我可以試試 dba
編輯：這是一個複制問題的 sqlfiddle，注意它使用的是 PostgreSQL，而我使用的是 hql

我發現這個解決方案有效。它不是很優雅，我擔心規模過大（它是執行子查詢兩次還是一次），但它可以工作。這是小提琴

WITH subquery as
      ( 
SELECT
   First.deal_id
   ,COALESCE( First.exchange, Second.exchange ) as exchange_id
   ,COALESCE( First.publisher, Second.publisher ) as pub_id
   ,COUNT (DISTINCT(case when Second.user_id is null then First.user_id else null END)) AS Incremental
   ,SUM (First.imps) AS First_imps
   ,SUM (Second.imps) AS Second_imps
   FROM
       (
           SELECT
               a.deal_id
               ,a.exchange
               ,a.publisher
               ,a.user_id
               ,1 AS imps
           FROM
               T1 a 
           WHERE
               a.deal_id &gt;= 1
       ) First 
       FULL OUTER JOIN (
           SELECT
               a.exchange
               ,a.publisher
               ,a.user_id
               ,1 AS imps
           FROM
               T1 a
           WHERE
           a.deal_id = 0
       ) Second
       ON (
           First.exchange = Second.exchange
           AND First.publisher = Second.publisher
           AND First.user_id = Second.user_id
       )
       GROUP BY
       First.deal_id
       ,COALESCE( First.exchange, Second.exchange )
       ,COALESCE( First.publisher, Second.publisher )
       )

SELECT
deal.deal_id,
deal.exchange_id,
deal.pub_id,
sum(deal.incremental),
sum(deal.first_imps),
sum(coalesce(deal.second_imps, 0) + coalesce(oa.second_imps,0))
FROM 
subquery deal
LEFT JOIN 
subquery oa 
ON (deal.exchange_id = oa.exchange_id
AND deal.pub_id = oa.pub_id
AND oa.deal_id is null)
WHERE deal.deal_id is not null
GROUP BY
deal.deal_id,
deal.exchange_id,
deal.pub_id
;

所有這些方法都是為了使您的原始查詢成為派生表，然後按 pub_id 分組。

SET hive.auto.convert.join = TRUE;

SELECT max(DEAL_ID) as DEAL_ID
    , EXCHANGE_ID
    , PUB_ID
    , sum(INCREMENTAL) as INCREMENTAL
    , sum(FIRST_IMPS) as FIRST_IMPS
    , sum(SECOND_IMPS) as SECOND_IMPS
 FROM (

   SELECT First.deal_id
        , COALESCE( First.exchange_id, Second.exchange_id ) as exchange_id
        , COALESCE( First.pub_id, Second.pub_id ) as pub_id
        , COUNT(DISTINCT(case when Second.user_id is null then First.user_id else null END)) AS Incremental
       , SUM(First.imps) AS First_imps
       , SUM(Second.imps) AS Second_imps
   FROM (SELECT a.deal_id
              , a.exchange_id
              , a.pub_id
              , a.user_id
              , 1 AS imps
           FROM logs a 
          WHERE a.deal_id &gt;= 1
           AND a.event_type = 'TRUE'
        ) First 
FULL JOIN (SELECT a.exchange_id
               , a.pub_id
               , a.user_id
              , 1 AS imps
           FROM logs a
          WHERE a.deal_id = 0
            AND a.event_type = 'TRUE'
         ) Second
     ON (   First.exchange_id = Second.exchange_id
        AND First.pub_id = Second.pub_id
        AND First.user_id = Second.user_id
        )
  GROUP BY First.deal_id
         , COALESCE( First.exchange_id, Second.exchange_id )
         , COALESCE( First.pub_id, Second.pub_id )

 ) group by pub_id, exchange_id

;

引用自：https://dba.stackexchange.com/questions/167063

如何將這個 FULL OUTER JOIN 中的空結果與非空響應分組？

相關問答

在 SQL Server 中忽略沒有任何結果的連接

Postgres 未在 array_agg 查詢中返回數據，如下所示

具有計算列的視圖上的外部聯接執行缺失（“外部”）記錄的計算

從一組值中選擇最非預設值

為什麼 COALESCE 函式不適用於此查詢？

用左連接合併右手錶欄位，還是更好的方法？