Postgresql
編寫此查詢的更好方法?
我的數據庫(Postgresql 10)有以下架構:
CREATE TABLE "PulledTexts" ( "Id" serial PRIMARY KEY, "BaseText" TEXT, "CleanText" TEXT ); CREATE TABLE "UniqueWords" ( "Id" serial PRIMARY KEY, "WordText" TEXT ); CREATE TABLE "WordTexts" ( "Id" serial PRIMARY KEY, "TextIdId" INTEGER REFERENCES "PulledTexts", "WordIdId" INTEGER REFERENCES "UniqueWords" ); CREATE INDEX "IX_WordTexts_TextIdId" ON "WordTexts" ("TextIdId"); CREATE INDEX "IX_WordTexts_WordIdId" ON "WordTexts" ("WordIdId");
一些樣本數據:
INSERT INTO public."PulledTexts" ("Id", "BaseText", "CleanText") VALUES (1, 'automate business audit', null), (2, 'audit trial', null), (3, 'trial', null), (4, 'audit', null), (5, 'fresh report', null), (6, 'fresh audit', null), (7, 'automate this script', null), (8, 'im trying here', null), (9, 'automate this business', null), (10, 'lateral', null); INSERT INTO public."UniqueWords" ("Id", "WordText") VALUES (1, 'trial'), (2, 'audit'), (3, 'creation'), (4, 'business'), (5, 'automate'); INSERT INTO public."WordTexts" ("Id", "TextIdId", "WordIdId") VALUES (1, 1, 2), (2, 1, 4), (3, 1, 5), (4, 2, 1), (5, 3, 1), (6, 4, 2), (7, 6, 2), (8, 7, 5), (9, 9, 4), (10, 9, 5), (11, 2, 2);
數據庫本身是通過實體框架遷移創建的。
我想知道是否有更好、更高效的方式來編寫此查詢,因為該
WordTexts
表將包含數十萬條記錄,最終包含數百萬條記錄。如果對這類查詢更有效,我也願意採用 NoSql 路線。SELECT * FROM "PulledTexts" WHERE "Id" IN ( SELECT "TextIdId" FROM "WordTexts" AS "wordTexts" LEFT JOIN "UniqueWords" AS "wordTexts.WordId" ON "wordTexts"."WordIdId" = "wordTexts.WordId"."Id" WHERE "wordTexts.WordId"."WordText" = 'automate' OR "TextIdId" IN ( SELECT "TextIdId" and1 from "WordTexts" AS "wordTexts" LEFT JOIN "UniqueWords" AS "wordTexts.WordId" ON "wordTexts"."WordIdId" = "wordTexts.WordId"."Id" where "wordTexts.WordId"."WordText" = 'audit' INTERSECT SELECT "TextIdId" and2 from "WordTexts" AS "wordTexts" LEFT JOIN "UniqueWords" AS "wordTexts.WordId" ON "wordTexts"."WordIdId" = "wordTexts.WordId"."Id" WHERE "wordTexts.WordId"."WordText" = 'trial' ) );
在較高級別上,此查詢應該返回其 ID 與子查詢返回的 ID 匹配的 PulledTexts。子查詢只是為了返回包含
('audit' AND 'trial') OR 'automate'
來自 UniqueWords 表的 PulledTexts 的 ID 列表。這正是我給出的範例查詢所做的。WordTexts 表是 UniqueWord 到 PulledText 的簡單映射。
雖然您的查詢是有效的,但我有很多不同的地方。
- 如果可以避免的話,不要在 Postgres 中使用 CaMeL-case 名稱。您的未命名實體框架可能會強迫您使用這種廢話,但我不想處理雙引號混亂,因此我在刪除所有雙引號後使用您的架構進行了測試 - 有效地使所有標識符小寫。
- 不要使用難以辨認或非法的列和表別名(如
"wordTexts.WordId"
)。這是品味和風格(以及理智)的問題,但您也在不AS
應該的地方省略了關鍵字,並將其保留在可以省略的地方。
- 我還格式化了更多格式,以便我更容易理解查詢。最後一部分是完全可選的。但是使用一些一致的格式樣式。
到達這裡:
SELECT * FROM PulledTexts WHERE Id IN ( SELECT w.TextIdId FROM WordTexts w -- AS can be omitted for table alias LEFT JOIN UniqueWords u ON w.WordIdId = u.Id -- LEFT JOIN might be necessary here WHERE u.WordText = 'automate' OR w.TextIdId IN ( SELECT w.TextIdId -- AS and1 -- column alias only documentation here, not visible FROM WordTexts w JOIN UniqueWords u ON w.WordIdId = u.Id -- LEFT JOIN misleading here WHERE u.WordText = 'audit' INTERSECT SELECT w.TextIdId -- AS and2 -- but don't omit AS for column alias FROM WordTexts w JOIN UniqueWords u ON w.WordIdId = u.Id WHERE u.WordText = 'trial' ) );
可以簡化為:
SELECT * FROM ( SELECT w.TextIdId AS Id FROM WordTexts w JOIN UniqueWords u ON w.WordIdId = u.Id -- now we don't need LEFT any more WHERE u.WordText = 'automate' UNION SELECT w.TextIdId FROM WordTexts w JOIN UniqueWords u ON w.WordIdId = u.Id WHERE u.WordText = 'audit' INTERSECT SELECT w.TextIdId FROM WordTexts w JOIN UniqueWords u ON w.WordIdId = u.Id WHERE u.WordText = 'trial' ) w JOIN PulledTexts p USING (Id)
我們不需要額外的括號,因為根據手冊:
INTERSECT
比 結合得更緊UNION
。也就是說,A UNION B INTERSECT C
將被讀作A UNION (B INTERSECT C)
.但這在替換多個相交的子查詢時會更快:
SELECT * FROM ( SELECT w.TextIdId AS Id FROM WordTexts w JOIN UniqueWords u ON w.WordIdId = u.Id WHERE u.WordText = 'automate' UNION SELECT TextIdId FROM WordTexts w1 JOIN WordTexts w2 USING (TextIdId) WHERE w1.WordIdId = (SELECT Id FROM UniqueWords WHERE WordText = 'audit') AND w2.WordIdId = (SELECT Id FROM UniqueWords WHERE WordText = 'trial') ) w JOIN PulledTexts p USING (Id)
該
INTERSECT
部分可以轉換為關係劃分問題。就在昨天的相關答案中的解釋:db<>在這裡擺弄
對性能最重要的是擁有正確的索引。您可能應該對in table有一個**
UNIQUE
約束,它**按此順序在這兩個列上實現目前缺少的索引。(WordIdId, TextIdId)``WordTexts