Postgresql
有效地獲取每組的前兩行
目前,我使用查詢
DISTINCT ON
來獲取每個作者的最新文章列表。但是我怎樣才能獲得每個作者以前的最新文章?換句話說,如何讓
DISTINCT ON
每組的第二行返回,而不是第一行。我需要這個來比較最近的文章和以前的文章。
CREATE TABLE posts ( title varchar(30), author varchar(30), created_at date ); INSERT INTO posts VALUES ('Johns first post', 'John', 'January 1, 2021'), ('Johns second post', 'John', 'January 2, 2021'), ('Johns third post', 'John', 'January 3, 2021'), ('Mikes first post', 'Mike', 'January 1, 2021'), ('Mikes second post', 'Mike', 'January 2, 2021'), ('Mikes third post', 'Mike', 'January 3, 2021');
此查詢選擇每個作者的最新文章。(在我的範例中,每個文章都是第三個文章。):
SELECT DISTINCT ON (author) * FROM posts ORDER BY author ASC, created_at DESC
db<>在這裡擺弄
我還需要每個作者的上一篇文章(在我的範例中,每個作者都有第二篇文章。)
我不想使用視窗函式,因為我的表足夠大,而且我想視窗函式可能會很慢。
DISTINCT ON
只有每組獲得一個(不同的)行才有好處。而且只有你可以以某種方式排序的那個。即使這樣,它也只有每組幾行才*有效。*看:我將假設一個大表,每個作者有很多文章(典型案例)。
簡單而緩慢
可以
row_number()
在子查詢中建構:SELECT * FROM ( SELECT *, row_number() OVER (PARTITION BY author ORDER BY created_at DESC NULLS LAST) AS post_num FROM posts ) p WHERE post_num < 3;
DESC NULLS LAST
因為您的所有列都可以為 NULL。您真的希望所有列都成為
NOT NULL
和created_at
成為timestamptz
。(有關更多資訊,請參見下文。)查詢將使用順序掃描,這對於這種情況來說效率非常低。(就像你想的那樣。)
精密而快速
您希望有效地使用索引。假設表格設計與顯示的一樣原始,例如:
CREATE INDEX ON posts (author DESC NULLS LAST, created_at DESC NULLS LAST);
我們可以用一些複雜的方法來實現它:
WITH RECURSIVE cte AS ( ( SELECT * FROM posts ORDER BY author DESC NULLS LAST, created_at DESC NULLS LAST LIMIT 1 ) UNION ALL SELECT p.* FROM cte c CROSS JOIN LATERAL ( SELECT * FROM posts p WHERE p.author < c.author -- lateral reference ORDER BY author DESC NULLS LAST, created_at DESC NULLS LAST LIMIT 1 ) p ) SELECT *, 1 AS post_num FROM cte UNION ALL SELECT p.* FROM cte c CROSS JOIN LATERAL ( SELECT *, 2 AS post_num FROM posts p WHERE p.author = c.author AND p.created_at < c.created_at -- assuming no two posts with same date ORDER BY created_at DESC NULLS LAST LIMIT 1 ) p;
db<>在這裡擺弄
第一步是經典的遞歸 CTE,以獲得每位作者的第一篇文章。詳細解釋在這裡:
第二步是在
LATERAL
子查詢中為每個作者獲取下一篇文章——再次使用索引。簡單快速
在適當的關係設計中,您將有一個單獨的
author
表,例如:CREATE TABLE author ( author_id int GENERATED ALWAYS AS IDENTITY PRIMARY KEY , author text NOT NULL ); INSERT INTO author(author) VALUES ('John') , ('Mike'); CREATE TABLE post ( post_id int GENERATED ALWAYS AS IDENTITY PRIMARY KEY , author_id int NOT NULL REFERENCES author , title varchar(30) NOT NULL , created_at timestamptz NOT NULL DEFAULT now() ); INSERT INTO post (author_id, title, created_at) VALUES (1, 'Johns first post', 'January 1, 2021') , (1, 'Johns second post', 'January 2, 2021') , (1, 'Johns third post', 'January 3, 2021') , (2, 'Mikes first post', 'January 1, 2021') , (2, 'Mikes second post', 'January 2, 2021') , (2, 'Mikes third post', 'January 3, 2021') ;
那麼索引可以簡單地是:
CREATE INDEX ON post (author_id, created_at);
我們可以有一個非常簡單和非常有效的查詢:
SELECT p.* FROM author a CROSS JOIN LATERAL ( SELECT * FROM post WHERE author_id = a.author_id ORDER BY created_at DESC LIMIT 2 ) p;
db<>在這裡擺弄