Join
如何將這個 FULL OUTER JOIN 中的空結果與非空響應分組?
我正在使用完全連接查看來自兩個不同選擇語句的使用者 ID 的重疊和非重疊(唯一值)。主要區別在於一個表的 deal_id = 0,而另一個表的 deal_id 大於或等於 1。我加入了關於 exchange_id、pub_id 和 user_id 的 select 語句,但沒有加入 deal_id。
欄位增量值是嘗試計算存在於池中的使用者 deal_id >= 1 並且不存在於池中的 deal_id = 0(完全加入的主要原因)。
這是我輸入的查詢的簡化,它在兩個選擇語句中:
SET hive.auto.convert.join = TRUE ; SELECT First.deal_id ,COALESCE( First.exchange_id, Second.exchange_id ) as exchange_id ,COALESCE( First.pub_id, Second.pub_id ) as pub_id ,COUNT (DISTINCT(case when Second.user_id is null then First.user_id else null END)) AS Incremental ,SUM (First.imps) AS First_imps ,SUM (Second.imps) AS Second_imps FROM ( SELECT a.deal_id ,a.exchange_id ,a.pub_id ,a.user_id ,1 AS imps FROM logs a WHERE a.deal_id >= 1 AND a.event_type = 'TRUE' ) First FULL JOIN ( SELECT a.exchange_id ,a.pub_id ,a.user_id ,1 AS imps FROM logs a WHERE a.deal_id = 0 AND a.event_type = 'TRUE' ) Second ON ( First.exchange_id = Second.exchange_id AND First.pub_id = Second.pub_id AND First.user_id = Second.user_id ) GROUP BY First.deal_id ,COALESCE( First.exchange_id, Second.exchange_id ) ,COALESCE( First.pub_id, Second.pub_id ) ;
這是我看到的結果:
DEAL_ID EXCHANGE_ID PUB_ID INCREMENTAL FIRST_IMPS SECOND_IMPS /N 4 1780 0 0 15 /N 4 1560 0 0 32 3389 4 1780 2 7 6 1534 4 1560 4 9 8
這是我想看到的:
DEAL_ID EXCHANGE_ID PUB_ID INCREMENTAL FIRST_IMPS SECOND_IMPS 3389 4 1780 2 7 21 1534 4 1560 4 9 40
交易 ID 為空的結果與基於 exchange_id 和 pub_id 的非空交易 ID 的結果相匹配。
我能做些什麼?
與此問題類似,但此解決方案不適用於此問題。
注意:我已經在 stackoverflow 上發布了這個問題,但我想我可以試試 dba
編輯:這是一個複制問題的 sqlfiddle,注意它使用的是 PostgreSQL,而我使用的是 hql
我發現這個解決方案有效。它不是很優雅,我擔心規模過大(它是執行子查詢兩次還是一次),但它可以工作。這是小提琴
WITH subquery as ( SELECT First.deal_id ,COALESCE( First.exchange, Second.exchange ) as exchange_id ,COALESCE( First.publisher, Second.publisher ) as pub_id ,COUNT (DISTINCT(case when Second.user_id is null then First.user_id else null END)) AS Incremental ,SUM (First.imps) AS First_imps ,SUM (Second.imps) AS Second_imps FROM ( SELECT a.deal_id ,a.exchange ,a.publisher ,a.user_id ,1 AS imps FROM T1 a WHERE a.deal_id >= 1 ) First FULL OUTER JOIN ( SELECT a.exchange ,a.publisher ,a.user_id ,1 AS imps FROM T1 a WHERE a.deal_id = 0 ) Second ON ( First.exchange = Second.exchange AND First.publisher = Second.publisher AND First.user_id = Second.user_id ) GROUP BY First.deal_id ,COALESCE( First.exchange, Second.exchange ) ,COALESCE( First.publisher, Second.publisher ) ) SELECT deal.deal_id, deal.exchange_id, deal.pub_id, sum(deal.incremental), sum(deal.first_imps), sum(coalesce(deal.second_imps, 0) + coalesce(oa.second_imps,0)) FROM subquery deal LEFT JOIN subquery oa ON (deal.exchange_id = oa.exchange_id AND deal.pub_id = oa.pub_id AND oa.deal_id is null) WHERE deal.deal_id is not null GROUP BY deal.deal_id, deal.exchange_id, deal.pub_id ;
所有這些方法都是為了使您的原始查詢成為派生表,然後按 pub_id 分組。
SET hive.auto.convert.join = TRUE; SELECT max(DEAL_ID) as DEAL_ID , EXCHANGE_ID , PUB_ID , sum(INCREMENTAL) as INCREMENTAL , sum(FIRST_IMPS) as FIRST_IMPS , sum(SECOND_IMPS) as SECOND_IMPS FROM ( SELECT First.deal_id , COALESCE( First.exchange_id, Second.exchange_id ) as exchange_id , COALESCE( First.pub_id, Second.pub_id ) as pub_id , COUNT(DISTINCT(case when Second.user_id is null then First.user_id else null END)) AS Incremental , SUM(First.imps) AS First_imps , SUM(Second.imps) AS Second_imps FROM (SELECT a.deal_id , a.exchange_id , a.pub_id , a.user_id , 1 AS imps FROM logs a WHERE a.deal_id >= 1 AND a.event_type = 'TRUE' ) First FULL JOIN (SELECT a.exchange_id , a.pub_id , a.user_id , 1 AS imps FROM logs a WHERE a.deal_id = 0 AND a.event_type = 'TRUE' ) Second ON ( First.exchange_id = Second.exchange_id AND First.pub_id = Second.pub_id AND First.user_id = Second.user_id ) GROUP BY First.deal_id , COALESCE( First.exchange_id, Second.exchange_id ) , COALESCE( First.pub_id, Second.pub_id ) ) group by pub_id, exchange_id
;