Postgresql

如何在子查詢中刪除重複項和排序?

  • June 21, 2022

我有一個“標題”列的記錄,我按空格分割並用每個單詞執行全文搜尋。我將結果儲存在物化視圖中。

這行得通,但是我得到了各種單詞的重複結果,我需要按它們的排名對結果進行排序。我可以做一個或另一個 - 不能兩者兼而有之。我該怎麼做?

我的查詢:

SELECT
   asset.id,
   (
       select
           jsonb_agg(resultsForWord)
       FROM
           UNNEST(
               string_to_array(TRIM(regexp_replace(asset.title, '[^a-zA-Z+]', ' ', 'g')), ' ')
           ) as word
           INNER JOIN LATERAL 
           (
               SELECT
                   searchresult.id,
                   searchresult.title,
                   ts_rank(ts, to_tsquery ('english', word)) rank
               FROM
                   assets searchresult
               WHERE
                   searchresult.id != asset.id AND
                   ts_rank(ts, to_tsquery ('english', word)) > 0.5
               LIMIT 5
           ) AS resultsForWord ON 1=1
    ) results
FROM
   assets asset
WHERE asset.id = 'abc'
GROUP BY asset.id;

為了過濾掉我剛剛做的重複

jsonb_agg(DISTINCT resultsForWord)

按等級排序,我剛剛做了

jsonb_agg(resultsForWord ORDER BY rank DESC)

當我兩者都做時,我得到:

ERROR: in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list

範例數據:

CREATE TABLE assets (
 id TEXT PRIMARY KEY,
 title TEXT,
 ts tsvector 
  GENERATED ALWAYS AS (setweight(to_tsvector('english', coalesce(title, '')), 'A')) STORED
)

INSERT INTO assets (id, title) VALUES ('a', 'Hello world!'),
 ('b', 'Hello sir'),
 ('c', 'I am above the world'),
 ('d', 'World hello')

似乎你應該翻轉加入的順序,UNNEST這樣你最多只能加入一行。

您也可以刪除外部GROUP BY. 好像沒必要

SELECT
   asset.id,
   (
       select
           jsonb_agg(results ORDER BY results.rank DESC)
       FROM (
           SELECT
               searchresult.id,
               searchresult.title,
               resultsForWord.rank
           FROM
               assets searchresult
           CROSS JOIN LATERAL 
           (
               SELECT ts_rank(ts, to_tsquery ('english', word)) rank
               FROM UNNEST(
                   string_to_array(TRIM(regexp_replace(asset.title, '[^a-zA-Z+]', ' ', 'g')), ' ')
               ) as word
               WHERE ts_rank(ts, to_tsquery ('english', word)) > 0.5
               ORDER BY rank DESC
               LIMIT 1
           ) AS resultsForWord
           WHERE
               searchresult.id != asset.id
           ORDER BY rank DESC
           LIMIT 5
       ) results
    ) results
FROM
   assets asset
WHERE asset.id = 'a';

db<>小提琴

引用自:https://dba.stackexchange.com/questions/313548