Postgresql

PostgreSQL 地理空間查詢很慢

  • August 14, 2018

在放棄 MySQL 之後,我嘗試了 Elasticsearch,現在不想看看我是否可以使用 PostgreSQL/PostGIS,這可能只允許我使用 PostgreSQL。

我需要按距離(不能完全準確)從表中獲取記錄並按距離排序。該表有 1000 萬條記錄。

當我在 PostgreSQL 上的查詢速度比在 MySQL 上慢時,我想我一定做錯了什麼。

我能做的更好嗎?

桌子:

id | hash_id | town | geo_pt2 

geo_pt2 is geography

指數:

CREATE INDEX geo_pt2_gix ON public.member_profile USING gist (geo_pt2)

詢問:

SELECT hash_id, town
    , ST_Distance(t.x, geo_pt2) AS dist
FROM   member_profile, (SELECT ST_GeographyFromText('POINT(47.4667 8.3167)')) AS t(x)
WHERE  ST_DWithin(t.x, geo_pt2, 250000)
ORDER  BY dist
limit 100 offset 1000;

解釋:

Limit  (cost=9.08..9.08 rows=1 width=53)
 ->  Sort  (cost=9.07..9.08 rows=1 width=53)
       Sort Key: (_st_distance('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, member_profile.geo_pt2, '0'::double precision, true))
       ->  Index Scan using geo_pt2_gix on member_profile  (cost=0.42..9.06 rows=1 width=53)
             Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
             Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))

我在具有高 IOPS (NVMe) 的現代伺服器上使用 PostgreSQL 10,查詢需要 35 秒。

在@Evan Carroll 提出更好的性能建議後:

EXPLAIN ANALYZE SELECT hash_id, town
    , ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
FROM   member_profile
WHERE  ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
ORDER  BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
OFFSET 10000
FETCH NEXT 100 ROWS ONLY;

Limit  (cost=9.31..18.21 rows=1 width=61) (actual time=392.608..394.138 rows=100 loops=1)
 ->  Index Scan using geo_pt2_gix on member_profile  (cost=0.42..9.31 rows=1 width=61) (actual time=26.624..392.776 rows=10100 loops=1)
       Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
       Order By: (geo_pt2 <-> '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
       Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))
Planning time: 89.020 ms
Execution time: 395.039 ms

如果使用者分頁結束它會變慢:

EXPLAIN ANALYZE SELECT hash_id, town
    , ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
FROM   member_profile
WHERE  ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
ORDER  BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
OFFSET 1000000
FETCH NEXT 100 ROWS ONLY;

Limit  (cost=9.31..18.21 rows=1 width=61) (actual time=28872.156..28873.239 rows=100 loops=1)

->  Index Scan using geo_pt2_gix on member_profile  (cost=0.42..9.31 rows=1 width=61) (actual time=32.441..28764.569 rows=1000100 loops=1)
   Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
   Order By: (geo_pt2 <-> '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography)
   Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))
Planning time: 50.979 ms
Execution time: 28875.403 ms

首先,使用EXPLAIN ANALYZE(不僅僅是EXPLAIN)並將結果顯示\d在桌子上。(psql)。作為第一點,

ST_GeographyFromText('POINT(47.4667 8.3167)')

應該寫成ST_MakePoint(47.4667, 8.3167)::geography

你的問題是這種模式,

SELECT ST_Distance( ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
...
ORDER  BY dist
LIMIT 100 OFFSET 1000;

每次這樣做時,您至少必須計算到 1100 行的距離。也就是說,它不應該很慢。它很慢,因為為了那樣你必須計算所有行ST_Distance上的。我們可以使用運算符在此處使用 KNN 停止該操作。MySQL 不支持 KNN<->

SELECT hash_id, town
    , ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
FROM   member_profile
WHERE  ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
ORDER  BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
OFFSET 1000
FETCH NEXT 100 ROWS ONLY;

作為風格評論,我個人更喜歡OFFSET/FETCH(標準化方法限制/偏移)。

分頁

我不確定這會奏效。但是,它可能值得一試(讓我們更新)。

SELECT hash_id, town
    , ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist
    , ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 AS myknn
FROM member_profile
WHERE ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000)
 AND ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 > OLD_VALUE
ORDER BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2
FETCH NEXT 100 ROWS ONLY;

因此,第一次執行時,您保存最後一個值的值,myknn然後第二次執行時,您可以在此子句中重放該值OLD_VALUE

AND ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 > OLD_VALUE

因此,每次執行它時,您都在保存新的點以繼續,並使用FETCH NEXT x ROWS ONLY.

myknn並且dist對您來說可能是相同的,如果是這樣,您可以刪除其中一個。

引用自:https://dba.stackexchange.com/questions/214782