Clickhouse

Clickhouse 為 json 數據創建數據庫結構

  • June 12, 2020

Clickhouse 的新手並停留在用於導入嵌套的 json 數據的數據庫創建結構上

以如下所示的 json 數據為例

當有數據填充時

"FirewallMatchesActions": [
   "allow"
 ],
 "FirewallMatchesRuleIDs": [
   "1234abc"
 ],
 "FirewallMatchesSources": [
   "firewallRules"
 ],

或者

"FirewallMatchesActions": [
   "allow",
   "block"
 ],
 "FirewallMatchesRuleIDs": [
   "1234abc",
   "1235abb"
 ],
 "FirewallMatchesSources": [
   "firewallRules"
 ],

但可能有沒有填充它們的 json 數據

 "FirewallMatchesActions": [],
 "FirewallMatchesRuleIDs": [],
 "FirewallMatchesSources": [],

clickhouse 創建數據庫結構是什麼樣的?

ClickHouse 支持列類型為Array as Nested

看起來你的情況 Array 就足夠了:

CREATE TABLE json_import (
 TimeStamp DateTime DEFAULT now(),

 /* other columns */

 FirewallMatchesActions Array(String),
 FirewallMatchesRuleIDs Array(String),
 FirewallMatchesSources Array(String)
) ENGINE = MergeTree()
ORDER BY (TimeStamp);

/* insert test data */

INSERT INTO json_import (FirewallMatchesActions, FirewallMatchesRuleIDs, FirewallMatchesSources)
VALUES (['allow'], ['1234abc', '1235abb'], ['firewallRules']), 
      (['allow', 'block'], ['1234abc'], ['firewallRules']), 
      ([], [], []);

/* select data */
SELECT *
FROM json_import

/* result
┌───────────TimeStamp─┬─FirewallMatchesActions─┬─FirewallMatchesRuleIDs─┬─FirewallMatchesSources─┐
│ 2020-06-12 06:06:17 │ ['allow']              │ ['1234abc','1235abb']  │ ['firewallRules']      │
│ 2020-06-12 06:06:17 │ ['allow','block']      │ ['1234abc']            │ ['firewallRules']      │
│ 2020-06-12 06:06:17 │ []                     │ []                     │ []                     │
└─────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┘
*/

要減少儲存消耗,請考慮使用LowCardinality類型和數據編碼


附加資訊:

ClickHouse 中的嵌套資料結構

使用 LowCardinality 類型降低 Clickhouse 儲存成本 – Instana 工程師的經驗教訓

提高 ClickHouse 效率的新編碼

引用自:https://dba.stackexchange.com/questions/268997