Postgresql
優化postgres功能
我有以下功能,我想用它來更新狀態和距離列。
CREATE OR REPLACE FUNCTION public.check_deviation_table( ) RETURNS void LANGUAGE 'plpgsql' COST 100 VOLATILE AS $BODY$ declare status integer; geoms integer; distance float; selected_rotue_name text; Begin update ping_data set distance=sub.distance from (select min(st_distance(st_setsrid(st_transform(rl.geom,3857),3857),st_setsrid(st_transform(ST_SetSRID(ST_MakePoint(pd.longitude,pd.latitude),4326),3857),3857))) as distance ,pd.id from ping_data pd inner join order_data od on od."orderNumber"=pd.ordernumber inner join route_line rl on od."pickupLocName"||'-'||od."deliveryLocName"=rl."route name" where pd.distance is null group by pd.id)sub where ping_data.id=sub.id; update ping_data set status='No deviation' where ping_data.status is null and ping_data.distance<='3000'::int::float; update ping_data set status='Deviated' where ping_data.status is null and ping_data.distance>'3000'::int::float; End $BODY$;
它在幾千條記錄上執行良好,但是當我在
ping_data
有 642810 條記錄的整個表上執行它時,它只會繼續執行。我還在所涉及的表上創建了以下索引,但無濟於事。create index btree_ping on ping_data using btree(id); create index gist_route on route_line using gist(geom);
請參考以下詳情
Postgres 版本:PostgreSQL 12.4,Visual C++ build 1914,64-bit 編譯
PostGIS 版本:3.0 USE_GEOS=1 USE_PROJ=1 USE_STATS=1
機器詳情:Intel i5 16GB RAM Windows 10
下面共享的是所涉及的表的模式
CREATE TABLE public.route_line ( "route name" character varying COLLATE pg_catalog."default", geom geometry, routeid integer NOT NULL DEFAULT nextval('route_line_routeid_seq'::regclass), "pickup location" text COLLATE pg_catalog."default", "destination location" text COLLATE pg_catalog."default", CONSTRAINT route_line_pkey PRIMARY KEY (routeid) ) CREATE TABLE public.ping_data ( latitude double precision, longitude double precision, pingdt character varying COLLATE pg_catalog."default", shipmentid integer, addedat timestamp with time zone DEFAULT now(), ordernumber bigint, id integer NOT NULL DEFAULT nextval('ping_data_id_seq'::regclass), ping_dt timestamp with time zone, status text COLLATE pg_catalog."default", distance double precision ) CREATE TABLE public.order_data ( "assetNumber" text COLLATE pg_catalog."default", "createDt" timestamp with time zone, "deliveryLocName" text COLLATE pg_catalog."default", "orderNumber" bigint, "pickupLocName" text COLLATE pg_catalog."default", "releaseNumber" bigint, assigned character varying(1) COLLATE pg_catalog."default" DEFAULT '0'::character varying, city text COLLATE pg_catalog."default", transportername text COLLATE pg_catalog."default" )
@ErwinBrandstetter 請參考以下對您共享的更新查詢的解釋分析結果。
Update on ping_data p (cost=2199987.43..6027475.29 rows=97076 width=162) (actual time=6014063.492..6014063.861 rows=0 loops=1) -> Hash Join (cost=2199987.43..6027475.29 rows=97076 width=162) (actual time=168649.223..5998253.555 rows=545483 loops=1) Hash Cond: (sub.id = p.id) -> Subquery Scan on sub (cost=2144323.35..5958823.04 rows=97076 width=48) (actual time=167552.663..5987759.840 rows=545483 loops=1) -> GroupAggregate (cost=2144323.35..5957852.28 rows=97076 width=12) (actual time=167552.647..5984534.784 rows=545483 loops=1) Group Key: pd.id -> Sort (cost=2144323.35..2144630.74 rows=122956 width=13212) (actual time=165047.817..177245.569 rows=891149 loops=1) Sort Key: pd.id Sort Method: external merge Disk: 2274960kB -> Hash Join (cost=3815.85..47337.28 rows=122956 width=13212) (actual time=1134.751..2381.401 rows=891149 loops=1) Hash Cond: (pd.ordernumber = od.orderNumber) -> Seq Scan on ping_data pd (cost=0.00..38132.37 rows=97076 width=28) (actual time=0.122..381.017 rows=642810 loops=1) Filter: (distance IS NULL) -> Hash (cost=1451.69..1451.69 rows=1453 width=13200) (actual time=1134.316..1134.399 rows=1632 loops=1) Buckets: 512 Batches: 8 Memory Usage: 576kB -> Hash Join (cost=46.93..1451.69 rows=1453 width=13200) (actual time=159.161..1119.843 rows=1632 loops=1) Hash Cond: ((rl.route name)::text = ((od.pickupLocName || '-'::text) || od.deliveryLocName)) -> Seq Scan on route_line rl (cost=0.00..958.54 rows=954 width=13210) (actual time=25.129..121.613 rows=954 loops=1) -> Hash (cost=31.97..31.97 rows=1197 width=51) (actual time=11.256..11.259 rows=1197 loops=1) Buckets: 2048 Batches: 1 Memory Usage: 114kB -> Seq Scan on order_data od (cost=0.00..31.97 rows=1197 width=51) (actual time=9.379..10.686 rows=1197 loops=1) -> Hash (cost=38132.37..38132.37 rows=645737 width=89) (actual time=1042.800..1042.801 rows=642810 loops=1) Buckets: 32768 Batches: 32 Memory Usage: 2704kB -> Seq Scan on ping_data p (cost=0.00..38132.37 rows=645737 width=89) (actual time=0.135..674.748 rows=642810 loops=1) Planning Time: 155.156 ms Execution Time: 6016315.903 ms
有人可以幫我優化這個功能嗎?
這與您的原始版本完全相同:
CREATE OR REPLACE FUNCTION public.check_deviation_table() RETURNS void LANGUAGE sql AS $func$ UPDATE ping_data p SET distance = sub.distance , status = COALESCE(p.status , CASE WHEN sub.distance <= 3000 THEN 'No deviation' WHEN sub.distance > 3000 THEN 'Deviated' END) FROM ( SELECT min(st_distance(st_setsrid(st_transform(rl.geom,3857),3857) ,st_setsrid(st_transform(st_setsrid(st_makepoint(pd.longitude, pd.latitude),4326),3857),3857))) AS distance , pd.id FROM ping_data pd JOIN order_data od ON od."orderNumber" = pd.ordernumber JOIN route_line rl ON od."pickupLocName" || '-' || od."deliveryLocName" = rl."route name" WHERE pd.distance IS NULL GROUP BY pd.id ) sub WHERE ping_data.id = sub.id; $func$;
應該已經快了很多。
刪除所有無用的聲明變數。那麼就沒有什麼需要PL/pgSQL了。使用更簡單的 SQL 函式,或者只是簡單的查詢。
最重要的是,使用一個
UPDATE
而不是三個。您的原件將編寫兩個新的行版本,而不僅僅是一個。有幾個順序掃描而不是一個。(索引可能對此毫無用處或使其更加昂貴,但這取決於…)可能還有更多的潛力,但我會在沒有詳細資訊的情況下停在這裡。