Postgresql

僅從具有 10^7 行的表中檢索更改的行

  • January 24, 2016

我有一個大約 300 列和大約 10 7行的表,我必須只檢索已更改的行,按時間排序(不是不同的,但已更改)。結果集也可能被限制為 100 行。

我用lag視窗函式嘗試了以下查詢:

-- Suppose, we want to retrieve data from column1, column2 and column3 fields
-- There may be other fields, though
SELECT
 w."stamp",
 w."column1",
 w."column2",
 w."column3"
FROM (
      SELECT
        o.stamp,
        o.obj_id,
        "o"."column1",
        "o"."column2",
        "o"."column3",
        lag(o."column1") OVER (ORDER BY stamp) AS "_prev_column1",
        lag(o."column2") OVER (ORDER BY stamp) AS "_prev_column2",
        lag(o."column3") OVER (ORDER BY stamp) AS "_prev_column3" 
      FROM "table_name" o
      WHERE o.stamp BETWEEN '01.12.2015 00:00' AND '23.01.2016 00:00'
      ORDER BY o.stamp DESC
    ) AS w
WHERE 
 w.obj_id = 42 AND w.stamp BETWEEN '01.12.2015 00:00' AND '23.01.2016 00:00' AND 
 ("w"."_prev_column_1", "w"."_prev_column_2", "w"."_prev_column_3") IS DISTINCT FROM
 ("w"."column_1", "w"."column_2", "w"."column_3")
ORDER BY w.stamp DESC 
LIMIT 100;

但是,它需要太多時間才能完成。是否可以優化此查詢,或者問題應該以其他方式解決(例如,自定義函式)?


表定義:

CREATE TABLE table_name (
 id BIGINT PRIMARY KEY NOT NULL DEFAULT nextval('ds_dyn_sequence'::regclass),
 obj_id BIGINT NOT NULL,
 stamp TIMESTAMP NOT NULL DEFAULT now(),
 column1 BIGINT,
 column2 BIGINT,
 column3 BIGINT
 -- Other fields, all BIGINT NULL
);

CREATE UNIQUE INDEX obj_id_stamp_key ON table_name USING BTREE (obj_id, stamp);

該表每小時包含大約 10 4行。該表將限制為三個月,因此總行數約為 2*10 7。

PostgreSQL 版本:9.3

如果版本是 9.3 或更高版本,我會嘗試重寫查詢:

SELECT
   w.stamp,
   w.column1,
   w.column2,
   w.column3
FROM 
   "table_name" AS w
 JOIN LATERAL
   ( SELECT p.column1, p.column2, p.column3
     FROM "table_name" AS p
     WHERE p.stamp < w.stamp
       AND p.stamp >= '2015-12-01'::timestamp
     ORDER BY p.stamp DESC
     LIMIT 1
   ) AS p
   ON (w.column1, w.column2, w.column3)
      IS DISTINCT FROM
      (p.column1, p.column2, p.column3)
WHERE w.obj_id = 42 
 AND w.stamp >= '2015-12-01'::timestamp
 AND w.stamp <= '2016-01-23'::timestamp 
ORDER BY w.stamp DESC 
LIMIT 100 ;

使用您擁有的索引並使用索引(obj_id, stamp DESC)


與效率無關:您確定要使用BETWEEN時間戳嗎?使用包含獨占範圍更為常見。換句話說,不應該是這種情況:

 AND w.stamp <= '2016-01-23'::timestamp 

是?:

 AND w.stamp  < '2016-01-23'::timestamp 

使用BETWEEN(and <=),您搜尋的範圍是 53 全天 + 1 微秒(或任何精度timestamp)。

引用自:https://dba.stackexchange.com/questions/127101