Postgresql:如何優化使用 CTE 並具有 jsonb 列的查詢的性能?
我正在查詢 3 個連接的表,我正在使用 CTE 和奉承(將 jsonb 轉換為表格形式)表的 jsonb 列之一,然後通過該動態生成的表進行查詢,以便我可以計算該 jsonb 列中的單個數據。
使用此程式碼:
WITH students_query AS ( SELECT student_number, "examYear", school_name, subjects FROM students INNER JOIN schools ON students.school_id = schools.school_number INNER JOIN results ON results.id = students.subjects_id WHERE "examYear" = '2010' AND "examType" = 'CSEE' ), subjects_array AS ( SELECT jsonb_array_elements(subjects) AS subject_list FROM students_query ), unwrapper AS ( SELECT x.* FROM subjects_array, jsonb_to_record(subject_list) AS x( subject varchar(25), grade varchar(2) ) ), failures AS ( SELECT COUNT(*)::numeric AS fails FROM unwrapper WHERE "subject" = 'B/MATH' AND "grade" IN ('D', 'F', 'X') ), passes AS ( SELECT COUNT(*)::numeric AS passes FROM unwrapper WHERE "subject" = 'B/MATH' AND "grade" IN ('A', 'B', 'C') ), final AS ( SELECT COUNT(*)::numeric AS allStudents FROM unwrapper WHERE "subject" = 'B/MATH' ) SELECT fails AS "Number of Fails", passes AS "Number of passes", allStudents AS "Number of All Students", ROUND((fails/allStudents *100), 2) AS "Percent of Fails", ROUND((passes/allStudents *100), 2) AS "Percent of Passes" FROM failures, passes, final;
問題是緩慢。
這個特定的查詢大約需要 10 秒才能完成,但這是一個重要的查詢,大多數使用者都會使用它,所以我很想對其進行優化。
我做的步驟
**索引:**我已經製作了一些索引,但我不確定它們是否會有所作為。
drop index if exists fk_idx_results; drop index if exists idx_results_subjects; drop index if exists fk_idx_schools; drop index if exists idx_schools_name; drop index if exists fk_idx_students; create index fk_idx_results on results("id"); create index idx_results_subjects on results("subjects", "examYear"); create index fk_idx_schools on schools("school_number"); create index idx_schools_name on schools("school_name"); create index fk_idx_students on students("id", "subjects_id", "school_id");
設置我還在查詢之前添加了一些設置,這有點像 9 秒。
SET cpu_index_tuple_cost = .0005; SET random_page_cost = 2;
現在問我正在尋求幫助,我是 postgresql 和整個大型數據庫優化的新手。
這是
explain analyze
查詢報告。"Nested Loop (cost=6330354.67..6330354.76 rows=1 width=160) (actual time=10103.862..10103.865 rows=1 loops=1)" " CTE students_query" " -> Hash Join (cost=390430.66..607236.38 rows=476181 width=347) (actual time=1378.337..3543.161 rows=458487 loops=1)" " Hash Cond: (students.school_id = schools.school_number)" " -> Hash Join (cost=390237.54..600495.77 rows=476181 width=326) (actual time=1376.496..3407.941 rows=458487 loops=1)" " Hash Cond: (students.subjects_id = results.id)" " -> Seq Scan on students (cost=0.00..93850.85 rows=4658285 width=36) (actual time=0.029..616.052 rows=4658285 loops=1)" " -> Hash (cost=362894.28..362894.28 rows=476181 width=340) (actual time=1375.636..1375.636 rows=458487 loops=1)" " Buckets: 16384 Batches: 64 Memory Usage: 2876kB" " -> Seq Scan on results (cost=0.00..362894.28 rows=476181 width=340) (actual time=0.020..1187.420 rows=458487 loops=1)" " Filter: (("examYear" = '2010'::text) AND ("examType" = 'CSEE'::text))" " Rows Removed by Filter: 4199798" " -> Hash (cost=113.61..113.61 rows=6361 width=33) (actual time=1.831..1.831 rows=6361 loops=1)" " Buckets: 8192 Batches: 1 Memory Usage: 469kB" " -> Seq Scan on schools (cost=0.00..113.61 rows=6361 width=33) (actual time=0.009..0.857 rows=6361 loops=1)" " CTE subjects_array" " -> CTE Scan on students_query (cost=0.00..246423.67 rows=47618100 width=32) (actual time=1378.350..4638.517 rows=3473367 loops=1)" " CTE unwrapper" " -> Nested Loop (cost=0.00..1904724.00 rows=47618100 width=80) (actual time=1378.369..8489.830 rows=3473367 loops=1)" " -> CTE Scan on subjects_array (cost=0.00..952362.00 rows=47618100 width=32) (actual time=1378.351..5344.083 rows=3473367 loops=1)" " -> Function Scan on jsonb_to_record x (cost=0.00..0.01 rows=1 width=80) (actual time=0.001..0.001 rows=1 loops=3473367)" " CTE failures" " -> Aggregate (cost=1249984.05..1249984.07 rows=1 width=32) (actual time=9379.408..9379.409 rows=1 loops=1)" " -> CTE Scan on unwrapper (cost=0.00..1249975.13 rows=3571 width=0) (actual time=1378.423..9342.868 rows=387778 loops=1)" " Filter: (((subject)::text = 'B/MATH'::text) AND ((grade)::text = ANY ('{D,F,X}'::text[])))" " Rows Removed by Filter: 3085589" " CTE passes" " -> Aggregate (cost=1249984.05..1249984.07 rows=1 width=32) (actual time=365.217..365.217 rows=1 loops=1)" " -> CTE Scan on unwrapper unwrapper_1 (cost=0.00..1249975.13 rows=3571 width=0) (actual time=0.093..363.892 rows=24002 loops=1)" " Filter: (((subject)::text = 'B/MATH'::text) AND ((grade)::text = ANY ('{A,B,C}'::text[])))" " Rows Removed by Filter: 3449365" " CTE final" " -> Aggregate (cost=1072002.48..1072002.49 rows=1 width=32) (actual time=359.222..359.222 rows=1 loops=1)" " -> CTE Scan on unwrapper unwrapper_2 (cost=0.00..1071407.25 rows=238090 width=0) (actual time=0.005..339.228 rows=411822 loops=1)" " Filter: ((subject)::text = 'B/MATH'::text)" " Rows Removed by Filter: 3061545" " -> Nested Loop (cost=0.00..0.05 rows=1 width=64) (actual time=9744.630..9744.632 rows=1 loops=1)" " -> CTE Scan on failures (cost=0.00..0.02 rows=1 width=32) (actual time=9379.410..9379.411 rows=1 loops=1)" " -> CTE Scan on passes (cost=0.00..0.02 rows=1 width=32) (actual time=365.219..365.220 rows=1 loops=1)" " -> CTE Scan on final (cost=0.00..0.02 rows=1 width=32) (actual time=359.224..359.225 rows=1 loops=1)" "Planning time: 0.546 ms" "Execution time: 10186.026 ms"
附加資訊 Postgresql 版本:9.6.1
results
您可以在上面看到的表格有大約 460 萬行,其中所有行都包含subjects
::jsonb 列(我猜)在那裡有很大的不同。該
students
表有 460 萬results
行(與表完全相同),適用於所有科目結果在與 連結的結果表中的學生students.subjects_id
。該
schools
表有 6323 行,它們與students
位於 的錶鍊接schools.school_number = students.school_id
。範例主題列輸出。
[{"grade": "D", "subject": "HIST"}, {"grade": "D", "subject": "GEO"}, {"grade": "D", "subject": "KISW"}, {"grade": "C", "subject": "ENGL"}, {"grade": "D", "subject": "LIT ENG"}] [{"grade": "D", "subject": "CIV"}, {"grade": "D", "subject": "GEO"}, {"grade": "D", "subject": "KISW"}, {"grade": "D", "subject": "ENGL"}] [{"grade": "C", "subject": "CIV"}, {"grade": "D", "subject": "KISW"}, {"grade": "B", "subject": "ENGL"}, {"grade": "A", "subject": "CHEM"}, {"grade": "A", "subject": "BIO"}, {"grade": "B", "subject": "ENG SC"},{"grade": "C", "subject": "B/MATH"}, {"grade": "D", "subject": "ELECT INST"}, {"grade": "D", "subject": "ELECT ENG SC"}, {"grade": "F", "subject": "ELECT DRAUGHT"}] [{"grade": "F", "subject": "CIV"}, {"grade": "F", "subject": "GEO"}, {"grade": "C", "subject": "E/D/KIISLAMU"}, {"grade": "F", "subject": "KISW"}, {"grade": "F", "subject": "ENGL"}, {"grade": "F", "subject": "LIT ENG"}, {"grade": "C", "subject": "ARABIC"}] [{"grade": "F", "subject": "CIV"}, {"grade": "F", "subject": "HIST"}, {"grade": "F", "subject": "GEO"}, {"grade": "F", "subject": "KISW"}, {"grade": "F", "subject": "ENGL"}, {"grade": "F", "subject": "BIO"}, {"grade": "F", "subject": "B/MATH"}]
我同意你的結構看起來有點有趣而且沒有標準化。您的索引沒有做太多,但這是可以修復的。您可能想要索引 JSON 的元素,例如主題和等級。
由於這不是一個容易解釋的主題,您可能想查看這篇部落格文章,他在其中通過範例集進行了該操作:
http://bitnine.net/blog-postgresql/postgresql-internals-jsonb-type-and-its-indexes/?ckattempt=1