Hive
使用 hiveql 的累積和
我在 Hive 中有一張桌子,看起來像:
col1 col2 b 1 b 2 a 3 b 2 c 4 c 5
我如何使用 hiveql 將
col1
元素組合在一起,對它們求和,按總和排序,以及基於總和創建累積總和 (csum)?id sum_all csum a 3 3 b 5 8 c 9 17
我只設法提出了分組和總和,但對累積總和一無所知。Hive 不支持相關子查詢
select col1 as id sum(col2) as sum_all from t group by col1 order by sum_all
結果如下:
id sum_all a 3 b 5 c 9
由於不允許關聯子查詢,請嘗試使用派生表,然後將它們連接起來。
select a.id, a.sum_all, sum(b.sum_all) as csum from ( select col1 as id, sum(col2) as sum_all from t group by col1 ) a join ( select col1 as id, sum(col2) as sum_all from t group by col1 ) b on ( b.sum_all < a.sum_all ) or ( b.sum_all = a.sum_all and b.id <= a.id ) group by a.sum_all, a.id order by a.sum_all, a.id ;
這本質上是派生的 group-by 表上的自聯接。首先將分組結果保存到臨時表中然後執行自連接可能更有效。
根據手冊,Hive 也有視窗聚合,所以你也可以使用它們:
select a.id, a.sum_all, sum(a.sum_all) over (order by a.sum_all, a.id rows between unbounded preceding and current row) as csum from ( select col1 as id, sum(col2) as sum_all from t group by col1 ) a order by sum_all, id ;
或與:
select col1 as id, sum(col2) as sum_all, sum(sum(col2)) over (order by sum(col2), col1 rows between unbounded preceding and current row) as csum from t group by col1 order by sum_all, id ;