Postgresql

有效地獲取每組的前兩行

  • March 10, 2021

目前,我使用查詢DISTINCT ON來獲取每個作者的最新文章列表。

但是我怎樣才能獲得每個作者以前的最新文章?換句話說,如何讓DISTINCT ON每組的第二行返回,而不是第一行。

我需要這個來比較最近的文章和以前的文章。

CREATE TABLE posts (
 title varchar(30),
 author varchar(30),
 created_at date
);

INSERT INTO posts VALUES
('Johns first post', 'John', 'January 1, 2021'),
('Johns second post', 'John', 'January 2, 2021'),
('Johns third post', 'John', 'January 3, 2021'),
('Mikes first post', 'Mike', 'January 1, 2021'),
('Mikes second post', 'Mike', 'January 2, 2021'),
('Mikes third post', 'Mike', 'January 3, 2021');

此查詢選擇每個作者的最新文章。(在我的範例中,每個文章都是第三個文章。):

SELECT DISTINCT ON (author) * FROM posts ORDER BY author ASC, created_at DESC

db<>在這裡擺弄

我還需要每個作者的上一篇文章(在我的範例中,每個作者都有第二篇文章。)

我不想使用視窗函式,因為我的表足夠大,而且我想視窗函式可能會很慢。

DISTINCT ON只有每組獲得一個(不同的)行才有好處。而且只有你可以以某種方式排序的那個。即使這樣,它也只有每組行才*有效。*看:

我將假設一個大表,每個作者有很多文章(典型案例)。

簡單而緩慢

可以row_number()在子查詢中建構:

SELECT *
FROM  (
  SELECT *, row_number() OVER (PARTITION BY author ORDER BY created_at DESC NULLS LAST) AS post_num
  FROM   posts
  ) p
WHERE  post_num &lt; 3;

DESC NULLS LAST因為您的所有列都可以為 NULL。

您真的希望所有列都成為NOT NULLcreated_at成為timestamptz。(有關更多資訊,請參見下文。)

查詢將使用順序掃描,這對於這種情況來說效率非常低。(就像你想的那樣。)

精密而快速

您希望有效地使用索引。假設表格設計與顯示的一樣原始,例如:

CREATE INDEX ON posts (author DESC NULLS LAST, created_at DESC NULLS LAST);

我們可以用一些複雜的方法來實現它:

WITH RECURSIVE cte AS (
  (
  SELECT *
  FROM   posts
  ORDER  BY author DESC NULLS LAST, created_at DESC NULLS LAST
  LIMIT  1
  )

  UNION ALL
  SELECT p.*
  FROM   cte c
  CROSS  JOIN LATERAL (
     SELECT *
     FROM   posts p
     WHERE  p.author &lt; c.author  -- lateral reference
     ORDER  BY author DESC NULLS LAST, created_at DESC NULLS LAST
     LIMIT  1
     ) p
)
SELECT *, 1 AS post_num
FROM   cte

UNION ALL
SELECT p.*
FROM   cte c
CROSS  JOIN LATERAL (
  SELECT *, 2 AS post_num
  FROM   posts p
  WHERE  p.author = c.author
  AND    p.created_at &lt; c.created_at   -- assuming no two posts with same date
  ORDER  BY created_at DESC NULLS LAST
  LIMIT  1
  ) p;

db<>在這裡擺弄

第一步是經典的遞歸 CTE,以獲得每位作者的第一篇文章。詳細解釋在這裡:

第二步是在LATERAL子查詢中為每個作者獲取下一篇文章——再次使用索引。

簡單快速

在適當的關係設計中,您將有一個單獨的author表,例如:

CREATE TABLE author (
 author_id int GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, author    text NOT NULL
);

INSERT INTO author(author) VALUES
 ('John')
, ('Mike');

CREATE TABLE post (
 post_id    int GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, author_id  int NOT NULL REFERENCES author
, title      varchar(30) NOT NULL
, created_at timestamptz NOT NULL DEFAULT now()
);

INSERT INTO post (author_id, title, created_at) VALUES
 (1, 'Johns first post',  'January 1, 2021')
, (1, 'Johns second post', 'January 2, 2021')
, (1, 'Johns third post',  'January 3, 2021')
, (2, 'Mikes first post',  'January 1, 2021')
, (2, 'Mikes second post', 'January 2, 2021')
, (2, 'Mikes third post',  'January 3, 2021')
;

那麼索引可以簡單地是:

CREATE INDEX ON post (author_id, created_at);

我們可以有一個非常簡單非常有效的查詢:

SELECT p.*
FROM   author a
CROSS  JOIN LATERAL (
  SELECT *
  FROM   post
  WHERE  author_id = a.author_id
  ORDER  BY created_at DESC
  LIMIT  2
  ) p;

db<>在這裡擺弄

引用自:https://dba.stackexchange.com/questions/286627