Redshift
為什麼 Redshift UNLOAD 會提高輸出中 float4 欄位的精度?
我在 AWS Redshift 中創建了一個表,例如
CREATE TABLE exampleTableName ( id int identity(1,1), accId varchar(16) encode zstd, amount float4, primary key(id) ) distkey(accId) interleaved sortkey(accId);
表中的範例記錄的
amount
欄位值為120.12
。但是,當我嘗試通過執行 UNLOAD 導出數據時,生成的文件(本質上是 CSV)對欄位值具有額外的精度。
解除安裝命令:
UNLOAD ('SELECT * from exampleTableName') TO 's3://bucket/prefixFile_' IAM_ROLE 'XXX' HEADER ADDQUOTES PARALLEL OFF MAXFILESIZE AS 5gb DELIMITER AS ',' GZIP;
結果輸出中的欄位值:(
120.120003
即它增加了 4 個小數位,這不在原始數據集中)。為什麼會發生這種情況,如何防止額外的精度(即小數位)作為 UNLOAD 命令的一部分輸出?
AWS 論壇的回答:
this happens when you use FLOAT for the decimal data because it cannot store values of arbitrary precision. https://en.wikipedia.org/wiki/IEEE_754 I would generally always recommend using the DECIMAL datatype unless you have an existing application that has an unchangeable requirement for FLOAT, e.g., calculation expects FLOAT and output cannot change. Additionally, by using DECIMAL you are able to use our new AZ64 compression encoding which will reduce the amount of storage needed and improve your query performance. https://aws.amazon.com/about-aws/whats-new/2019/10/amazon-redshift-introduces-az64-a-new-compression-encoding-for-optimized-storage-and-high-query-performance/