Redshift - 將數據插入臨時表，條件為列 doc_id 應該是唯一的，而不是 null

June 20, 2021

我正在嘗試在 Redshift 中創建一個臨時表並將數據插入其中。我的目標是為唯一的doc_id WHERE doc_id IS NOT NULL創建一條記錄。
這是我的程式碼：
-- Creating temp table to load only rows with unique and not null doc_id
DROP TABLE IF EXISTS TMP_table CASCADE;

CREATE TEMP TABLE IF NOT EXISTS TMP_table
(
   uuid varchar,
   id integer,
   doc_id integer,
   revenue double,
   doc_date varchar,
);

-- insert into the temp table and add the distinct and not null filter on the doc_id
INSERT INTO TMP_table
(
   uuid,
   id,
   doc_id,
   revenue,
   doc_date
)
SELECT
   uuid,
   id,
   select DISTINCT (table_x.doc_id) from table_x where table_x.doc_id IS NOT NULL,
   revenue,
   doc_date
FROM schema.table_x;
執行上面的程式碼後，我得到一個語法錯誤幾乎不同。而且我似乎無法弄清楚錯誤是什麼。
請問有什麼指導嗎？

如果有重複的 doc_ids 我只想選擇一個並插入它（它也不應該為空，因此我的 where 語句）。
您可以使用視窗/排名功能 - 例如。ROW_NUMBER()- 簡化查詢。
注意：我沒有檢查這些在 Redshift 中是否可用。
INSERT INTO TMP_table
(
   uuid,
   id,
   doc_id,
   revenue,
   doc_date
)
SELECT 
   uuid,
   id,
   doc_id,
   revenue,
   doc_date
FROM
 (
   SELECT
       uuid,
       id,
       doc_id,
       revenue,
       doc_date,
       ROW_NUMBER() OVER          -- assign row numbers
         ( PARTITION BY doc_id    -- per doc_id
                                  -- without any specific order
         ) AS rn
   FROM schema.table_x
   WHERE doc_id IS NOT NULL 
 ) AS x
WHERE rn = 1    -- pick the first one per doc_id if there are 2+ 
;

引用自：https://dba.stackexchange.com/questions/294401

Redshift - 將數據插入臨時表，條件為列 doc_id 應該是唯一的，而不是 null

相關問答

SQL |基於另一張表的 row_number 設置的 Shuffle 順序

在對某些對象具有特權的 redshift 中刪除使用者

pg_catalog 是否可用於 aws redshift 數據共享使用者

RedShift 中的顯式鎖

Redshift：如果單元格數組中存在值，則返回行

能否使用 AWS Data Migration Service 將非 AWS SQL 伺服器與 Redshift 集成？