Postgresql
在 Postgres 中用於空間查詢的 3d 點數據的良好佈局?
如另一個問題所示,我處理了 3D 空間中的很多(>10,000,000)個點條目。這些點定義如下:
CREATE TYPE float3d AS ( x real, y real, z real);
如果我沒記錯的話,需要 3*8 字節 + 8 字節填充(
MAXALIGN
是 8)來儲存這些點之一。有沒有更好的方法來儲存這種數據?在上述問題中,據說複合類型涉及相當多的成本。我經常做這樣的空間查詢:
SELECT t1.id, t1.parent_id, (t1.location).x, (t1.location).y, (t1.location).z, t1.confidence, t1.radius, t1.skeleton_id, t1.user_id, t2.id, t2.parent_id, (t2.location).x, (t2.location).y, (t2.location).z, t2.confidence, t2.radius, t2.skeleton_id, t2.user_id FROM treenode t1 INNER JOIN treenode t2 ON ( (t1.id = t2.parent_id OR t1.parent_id = t2.id) OR (t1.parent_id IS NULL AND t1.id = t2.id)) WHERE (t1.LOCATION).z = 41000.0 AND (t1.LOCATION).x > 2822.6 AND (t1.LOCATION).x < 62680.2 AND (t1.LOCATION).y > 33629.8 AND (t1.LOCATION).y < 65458.6 AND t1.project_id = 1 LIMIT 5000;
像這樣的查詢大約需要 160 毫秒,但我想知道這是否可以減少。
這是結構用於的表格佈局:
Column | Type | Modifiers ---------------+--------------------------+------------------------------------------------------- id | bigint | not null default nextval('location_id_seq'::regclass) user_id | integer | not null creation_time | timestamp with time zone | not null default now() edition_time | timestamp with time zone | not null default now() project_id | integer | not null location | float3d | not null editor_id | integer | parent_id | bigint | radius | real | not null default 0 confidence | smallint | not null default 5 skeleton_id | integer | not null Indexes: "treenode_pkey" PRIMARY KEY, btree (id) "treenode_parent_id" btree (parent_id) "treenode_project_id_location_x_index" btree (project_id, ((location).x)) "treenode_project_id_location_y_index" btree (project_id, ((location).y)) "treenode_project_id_location_z_index" btree (project_id, ((location).z)) "treenode_project_id_skeleton_id_index" btree (project_id, skeleton_id) "treenode_project_id_user_id_index" btree (project_id, user_id) "treenode_skeleton_id_index" btree (skeleton_id)
複合型是簡潔的設計,但對性能一點幫助都**沒有。
首先,在 Postgres 中
float
翻譯為float8
aka 。double precision
你建立在一個誤解之上。數據類型占用 4個
real
字節(不是 8 個)。它必須以 4 個字節的倍數對齊。用 測量實際尺寸
pg_column_size()
。SQL Fiddle展示了實際大小。
複合類型
real3d
占用 36 個字節。那是:23 byte tuple header 1 byte padding 4 bytes real x 4 bytes real y 4 bytes real z --- 36 bytes
如果您將其嵌入到表格中,則可能必須添加填充。另一方面,該類型的標頭在磁碟上可以小 3 個字節。磁碟上的表示通常比 RAM 中的小一些。沒有太大區別。
更多的:
表格佈局
使用此等效設計可大幅減少行大小:
Column | Type | Modifiers ---------------+--------------------------+--------------------------------- id | bigint | not null default nextval(... creation_time | timestamp with time zone | not null default now() edition_time | timestamp with time zone | not null default now() user_id | integer | not null project_id | integer | not null location_x | real | not null location_y | real | not null location_z | real | not null radius | real | not null default 0 skeleton_id | integer | not null confidence | smallint | not null default 5 parent_id | bigint | editor_id | integer |
在驗證我的聲明之前和之後進行測試:
SELECT pg_relation_size('treenode') As table_size; SELECT avg(pg_column_size(t) AS avg_row_size FROM treenode t;
更多細節: