Redshift

為什麼 Redshift UNLOAD 會提高輸出中 float4 欄位的精度?

  • October 22, 2019

我在 AWS Redshift 中創建了一個表,例如

CREATE TABLE exampleTableName (
 id       int identity(1,1),
 accId    varchar(16) encode zstd,
 amount   float4,

 primary key(id)
)
distkey(accId)
interleaved sortkey(accId);

表中的範例記錄的amount欄位值為120.12

但是,當我嘗試通過執行 UNLOAD 導出數據時,生成的文件(本質上是 CSV)對欄位值具有額外的精度。

解除安裝命令:

UNLOAD ('SELECT * from exampleTableName')
TO 's3://bucket/prefixFile_'
IAM_ROLE 'XXX'
HEADER
ADDQUOTES
PARALLEL OFF
MAXFILESIZE AS 5gb
DELIMITER AS ',' 
GZIP;

結果輸出中的欄位值:(120.120003即它增加了 4 個小數位,這不在原始數據集中)。

為什麼會發生這種情況,如何防止額外的精度(即小數位)作為 UNLOAD 命令的一部分輸出?

AWS 論壇的回答:

this happens when you use FLOAT for the decimal data because it cannot store values
of arbitrary precision. https://en.wikipedia.org/wiki/IEEE_754

I would generally always recommend using the DECIMAL datatype unless you have an
existing application that has an unchangeable requirement for FLOAT, e.g., 
calculation expects FLOAT and output cannot change.

Additionally, by using DECIMAL you are able to use our new AZ64 compression encoding
which will reduce the amount of storage needed and improve your query performance.
https://aws.amazon.com/about-aws/whats-new/2019/10/amazon-redshift-introduces-az64-a-new-compression-encoding-for-optimized-storage-and-high-query-performance/

引用自:https://dba.stackexchange.com/questions/250485