Join

如何將這個 FULL OUTER JOIN 中的空結果與非空響應分組?

  • March 17, 2017

我正在使用完全連接查看來自兩個不同選擇語句的使用者 ID 的重疊和非重疊(唯一值)。主要區別在於一個表的 deal_id = 0,而另一個表的 deal_id 大於或等於 1。我加入了關於 exchange_id、pub_id 和 user_id 的 select 語句,但沒有加入 deal_id。

欄位增量值是嘗試計算存在於池中的使用者 deal_id >= 1 並且不存在於池中的 deal_id = 0(完全加入的主要原因)。

這是我輸入的查詢的簡化,它在兩個選擇語句中:

SET
hive.auto.convert.join = TRUE
;

SELECT
   First.deal_id
   ,COALESCE( First.exchange_id, Second.exchange_id ) as exchange_id
   ,COALESCE( First.pub_id, Second.pub_id ) as pub_id
   ,COUNT (DISTINCT(case when Second.user_id is null then First.user_id else null END)) AS Incremental
   ,SUM (First.imps) AS First_imps
   ,SUM (Second.imps) AS Second_imps
   FROM
       (
           SELECT
               a.deal_id
               ,a.exchange_id
               ,a.pub_id
               ,a.user_id
               ,1 AS imps
           FROM
               logs a 
           WHERE
               a.deal_id >= 1
           AND a.event_type = 'TRUE'
       ) First 
       FULL JOIN (
           SELECT
               a.exchange_id
               ,a.pub_id
               ,a.user_id
               ,1 AS imps
           FROM
               logs a
           WHERE
           a.deal_id = 0
           AND a.event_type = 'TRUE'
       ) Second
       ON (
           First.exchange_id = Second.exchange_id
           AND First.pub_id = Second.pub_id
           AND First.user_id = Second.user_id
       )
       GROUP BY
       First.deal_id
       ,COALESCE( First.exchange_id, Second.exchange_id )
       ,COALESCE( First.pub_id, Second.pub_id )
;

這是我看到的結果:

DEAL_ID    EXCHANGE_ID    PUB_ID    INCREMENTAL    FIRST_IMPS    SECOND_IMPS
/N         4              1780      0              0             15
/N         4              1560      0              0             32
3389       4              1780      2              7             6
1534       4              1560      4              9             8

這是我想看到的:

DEAL_ID    EXCHANGE_ID    PUB_ID    INCREMENTAL    FIRST_IMPS    SECOND_IMPS
3389       4              1780      2              7             21
1534       4              1560      4              9             40

交易 ID 為空的結果與基於 exchange_id 和 pub_id 的非空交易 ID 的結果相匹配。

我能做些什麼?

與此問題類似,但此解決方案不適用於此問題。

注意:我已經在 stackoverflow 上發布了這個問題,我想我可以試試 dba

編輯:是一個複制問題的 sqlfiddle,注意它使用的是 PostgreSQL,而我使用的是 hql

我發現這個解決方案有效。它不是很優雅,我擔心規模過大(它是執行子查詢兩次還是一次),但它可以工作。這是小提琴

WITH subquery as
      ( 
SELECT
   First.deal_id
   ,COALESCE( First.exchange, Second.exchange ) as exchange_id
   ,COALESCE( First.publisher, Second.publisher ) as pub_id
   ,COUNT (DISTINCT(case when Second.user_id is null then First.user_id else null END)) AS Incremental
   ,SUM (First.imps) AS First_imps
   ,SUM (Second.imps) AS Second_imps
   FROM
       (
           SELECT
               a.deal_id
               ,a.exchange
               ,a.publisher
               ,a.user_id
               ,1 AS imps
           FROM
               T1 a 
           WHERE
               a.deal_id >= 1
       ) First 
       FULL OUTER JOIN (
           SELECT
               a.exchange
               ,a.publisher
               ,a.user_id
               ,1 AS imps
           FROM
               T1 a
           WHERE
           a.deal_id = 0
       ) Second
       ON (
           First.exchange = Second.exchange
           AND First.publisher = Second.publisher
           AND First.user_id = Second.user_id
       )
       GROUP BY
       First.deal_id
       ,COALESCE( First.exchange, Second.exchange )
       ,COALESCE( First.publisher, Second.publisher )
       )

SELECT
deal.deal_id,
deal.exchange_id,
deal.pub_id,
sum(deal.incremental),
sum(deal.first_imps),
sum(coalesce(deal.second_imps, 0) + coalesce(oa.second_imps,0))
FROM 
subquery deal
LEFT JOIN 
subquery oa 
ON (deal.exchange_id = oa.exchange_id
AND deal.pub_id = oa.pub_id
AND oa.deal_id is null)
WHERE deal.deal_id is not null
GROUP BY
deal.deal_id,
deal.exchange_id,
deal.pub_id
;

所有這些方法都是為了使您的原始查詢成為派生表,然後按 pub_id 分組。

SET hive.auto.convert.join = TRUE;

SELECT max(DEAL_ID) as DEAL_ID
    , EXCHANGE_ID
    , PUB_ID
    , sum(INCREMENTAL) as INCREMENTAL
    , sum(FIRST_IMPS) as FIRST_IMPS
    , sum(SECOND_IMPS) as SECOND_IMPS
 FROM (

   SELECT First.deal_id
        , COALESCE( First.exchange_id, Second.exchange_id ) as exchange_id
        , COALESCE( First.pub_id, Second.pub_id ) as pub_id
        , COUNT(DISTINCT(case when Second.user_id is null then First.user_id else null END)) AS Incremental
       , SUM(First.imps) AS First_imps
       , SUM(Second.imps) AS Second_imps
   FROM (SELECT a.deal_id
              , a.exchange_id
              , a.pub_id
              , a.user_id
              , 1 AS imps
           FROM logs a 
          WHERE a.deal_id >= 1
           AND a.event_type = 'TRUE'
        ) First 
FULL JOIN (SELECT a.exchange_id
               , a.pub_id
               , a.user_id
              , 1 AS imps
           FROM logs a
          WHERE a.deal_id = 0
            AND a.event_type = 'TRUE'
         ) Second
     ON (   First.exchange_id = Second.exchange_id
        AND First.pub_id = Second.pub_id
        AND First.user_id = Second.user_id
        )
  GROUP BY First.deal_id
         , COALESCE( First.exchange_id, Second.exchange_id )
         , COALESCE( First.pub_id, Second.pub_id )

 ) group by pub_id, exchange_id

;

引用自:https://dba.stackexchange.com/questions/167063