Postgresql
PostgreSQL 地理空間查詢很慢
在放棄 MySQL 之後,我嘗試了 Elasticsearch,現在不想看看我是否可以使用 PostgreSQL/PostGIS,這可能只允許我使用 PostgreSQL。
我需要按距離(不能完全準確)從表中獲取記錄並按距離排序。該表有 1000 萬條記錄。
當我在 PostgreSQL 上的查詢速度比在 MySQL 上慢時,我想我一定做錯了什麼。
我能做的更好嗎?
桌子:
id | hash_id | town | geo_pt2 geo_pt2 is geography
指數:
CREATE INDEX geo_pt2_gix ON public.member_profile USING gist (geo_pt2)
詢問:
SELECT hash_id, town , ST_Distance(t.x, geo_pt2) AS dist FROM member_profile, (SELECT ST_GeographyFromText('POINT(47.4667 8.3167)')) AS t(x) WHERE ST_DWithin(t.x, geo_pt2, 250000) ORDER BY dist limit 100 offset 1000;
解釋:
Limit (cost=9.08..9.08 rows=1 width=53) -> Sort (cost=9.07..9.08 rows=1 width=53) Sort Key: (_st_distance('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, member_profile.geo_pt2, '0'::double precision, true)) -> Index Scan using geo_pt2_gix on member_profile (cost=0.42..9.06 rows=1 width=53) Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography) Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true))
我在具有高 IOPS (NVMe) 的現代伺服器上使用 PostgreSQL 10,查詢需要 35 秒。
在@Evan Carroll 提出更好的性能建議後:
EXPLAIN ANALYZE SELECT hash_id, town , ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist FROM member_profile WHERE ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000) ORDER BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 OFFSET 10000 FETCH NEXT 100 ROWS ONLY; Limit (cost=9.31..18.21 rows=1 width=61) (actual time=392.608..394.138 rows=100 loops=1) -> Index Scan using geo_pt2_gix on member_profile (cost=0.42..9.31 rows=1 width=61) (actual time=26.624..392.776 rows=10100 loops=1) Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography) Order By: (geo_pt2 <-> '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography) Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true)) Planning time: 89.020 ms Execution time: 395.039 ms
如果使用者分頁結束它會變慢:
EXPLAIN ANALYZE SELECT hash_id, town , ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist FROM member_profile WHERE ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000) ORDER BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 OFFSET 1000000 FETCH NEXT 100 ROWS ONLY; Limit (cost=9.31..18.21 rows=1 width=61) (actual time=28872.156..28873.239 rows=100 loops=1) -> Index Scan using geo_pt2_gix on member_profile (cost=0.42..9.31 rows=1 width=61) (actual time=32.441..28764.569 rows=1000100 loops=1) Index Cond: (geo_pt2 && '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography) Order By: (geo_pt2 <-> '0101000020E610000088855AD3BCBB474052499D8026A22040'::geography) Filter: (('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography && _st_expand(geo_pt2, '250000'::double precision)) AND _st_dwithin('0101000020E610000088855AD3BCBB474052499D8026A22040'::geography, geo_pt2, '250000'::double precision, true)) Planning time: 50.979 ms Execution time: 28875.403 ms
首先,使用
EXPLAIN ANALYZE
(不僅僅是EXPLAIN
)並將結果顯示\d
在桌子上。(psql)。作為第一點,ST_GeographyFromText('POINT(47.4667 8.3167)')
應該寫成
ST_MakePoint(47.4667, 8.3167)::geography
你的問題是這種模式,
SELECT ST_Distance( ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist ... ORDER BY dist LIMIT 100 OFFSET 1000;
每次這樣做時,您至少必須計算到 1100 行的距離。也就是說,它不應該很慢。它很慢,因為為了那樣你必須計算所有行
ST_Distance
上的。我們可以使用運算符在此處使用 KNN 停止該操作。MySQL 不支持 KNN。<->
SELECT hash_id, town , ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist FROM member_profile WHERE ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000) ORDER BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 OFFSET 1000 FETCH NEXT 100 ROWS ONLY;
作為風格評論,我個人更喜歡OFFSET/FETCH(標準化方法限制/偏移)。
分頁
我不確定這會奏效。但是,它可能值得一試(讓我們更新)。
SELECT hash_id, town , ST_Distance(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2) AS dist , ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 AS myknn FROM member_profile WHERE ST_DWithin(ST_MakePoint(47.4667, 8.3167)::geography, geo_pt2, 250000) AND ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 > OLD_VALUE ORDER BY ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 FETCH NEXT 100 ROWS ONLY;
因此,第一次執行時,您保存最後一個值的值,
myknn
然後第二次執行時,您可以在此子句中重放該值OLD_VALUE
,AND ST_MakePoint(47.4667, 8.3167)::geography <-> geo_pt2 > OLD_VALUE
因此,每次執行它時,您都在保存新的點以繼續,並使用
FETCH NEXT x ROWS ONLY
.
myknn
並且dist
對您來說可能是相同的,如果是這樣,您可以刪除其中一個。