Postgresql

在 Postgres 中用於空間查詢的 3d 點數據的良好佈局?

  • January 19, 2018

另一個問題所示,我處理了 3D 空間中的很多(>10,000,000)個點條目。這些點定義如下:

CREATE TYPE float3d AS (
 x real,
 y real,
 z real);

如果我沒記錯的話,需要 3*8 字節 + 8 字節填充(MAXALIGN是 8)來儲存這些點之一。有沒有更好的方法來儲存這種數據?在上述問題中,據說複合類型涉及相當多的成本。

我經常做這樣的空間查詢:

 SELECT t1.id, t1.parent_id, (t1.location).x, (t1.location).y, (t1.location).z,
        t1.confidence, t1.radius, t1.skeleton_id, t1.user_id,
        t2.id, t2.parent_id, (t2.location).x, (t2.location).y, (t2.location).z,
        t2.confidence, t2.radius, t2.skeleton_id, t2.user_id
 FROM treenode t1
      INNER JOIN treenode t2 ON
        (   (t1.id = t2.parent_id OR t1.parent_id = t2.id)
         OR (t1.parent_id IS NULL AND t1.id = t2.id))
       WHERE (t1.LOCATION).z = 41000.0
         AND (t1.LOCATION).x > 2822.6
         AND (t1.LOCATION).x < 62680.2
         AND (t1.LOCATION).y > 33629.8
         AND (t1.LOCATION).y < 65458.6
         AND t1.project_id = 1 LIMIT 5000;

像這樣的查詢大約需要 160 毫秒,但我想知道這是否可以減少。

這是結構用於的表格佈局:

   Column     |           Type           |                       Modifiers                    
---------------+--------------------------+-------------------------------------------------------
id            | bigint                   | not null default nextval('location_id_seq'::regclass)
user_id       | integer                  | not null
creation_time | timestamp with time zone | not null default now()
edition_time  | timestamp with time zone | not null default now()
project_id    | integer                  | not null
location      | float3d                  | not null
editor_id     | integer                  |
parent_id     | bigint                   |
radius        | real                     | not null default 0
confidence    | smallint                 | not null default 5
skeleton_id   | integer                  | not null

Indexes:
   "treenode_pkey" PRIMARY KEY, btree (id)
   "treenode_parent_id" btree (parent_id)
   "treenode_project_id_location_x_index" btree (project_id, ((location).x))
   "treenode_project_id_location_y_index" btree (project_id, ((location).y))
   "treenode_project_id_location_z_index" btree (project_id, ((location).z))
   "treenode_project_id_skeleton_id_index" btree (project_id, skeleton_id)
   "treenode_project_id_user_id_index" btree (project_id, user_id)
   "treenode_skeleton_id_index" btree (skeleton_id)

複合型是簡潔的設計,但對性能一點幫助都**沒有

首先,在 Postgres 中float翻譯為float8aka 。double precision你建立在一個誤解之上。

數據類型占用 4個real字節(不是 8 個)。它必須以 4 個字節的倍數對齊。

用 測量實際尺寸pg_column_size()

SQL Fiddle展示了實際大小。

複合類型real3d占用 36 個字節。那是:

23 byte tuple header
1 byte padding
4 bytes real x
4 bytes real y
4 bytes real z
---
36 bytes

如果您將其嵌入到表格中,則可能必須添加填充。另一方面,該類型的標頭在磁碟上可以小 3 個字節。磁碟上的表示通常比 RAM 中的小一些。沒有太大區別。

更多的:

表格佈局

使用此等效設計可大幅減少行大小:

   Column     |           Type           |                       Modifiers
---------------+--------------------------+---------------------------------
id            | bigint                   | not null default nextval(...
creation_time | timestamp with time zone | not null default now()
edition_time  | timestamp with time zone | not null default now()
user_id       | integer                  | not null
project_id    | integer                  | not null
location_x    | real                     | not null
location_y    | real                     | not null
location_z    | real                     | not null
radius        | real                     | not null default 0
skeleton_id   | integer                  | not null
confidence    | smallint                 | not null default 5
parent_id     | bigint                   |
editor_id     | integer                  |

在驗證我的聲明之前和之後進行測試:

SELECT pg_relation_size('treenode') As table_size;

SELECT avg(pg_column_size(t) AS avg_row_size
FROM   treenode t;

更多細節:

引用自:https://dba.stackexchange.com/questions/72787