Postgresql

Postgresql:如何優化使用 CTE 並具有 jsonb 列的查詢的性能?

  • March 22, 2017

我正在查詢 3 個連接的表,我正在使用 CTE 和奉承(將 jsonb 轉換為表格形式)表的 jsonb 列之一,然後通過該動態生成的表進行查詢,以便我可以計算該 jsonb 列中的單個數據。

使用此程式碼:

WITH students_query AS (
   SELECT student_number, "examYear", school_name, subjects
   FROM students
   INNER JOIN schools ON students.school_id = schools.school_number
   INNER JOIN results ON results.id = students.subjects_id
   WHERE
       "examYear" = '2010'
       AND
       "examType" = 'CSEE'
), subjects_array AS (
   SELECT jsonb_array_elements(subjects) AS subject_list
   FROM students_query
), unwrapper AS (
   SELECT 
       x.*
       FROM subjects_array,
       jsonb_to_record(subject_list) AS x(
           subject varchar(25),
           grade varchar(2)
       )
), failures AS (
   SELECT 
       COUNT(*)::numeric AS fails
       FROM unwrapper
       WHERE 
           "subject" = 'B/MATH' AND "grade" IN ('D', 'F', 'X')

), passes AS (
   SELECT
       COUNT(*)::numeric AS passes
       FROM unwrapper
       WHERE
           "subject" = 'B/MATH' AND "grade" IN ('A', 'B', 'C')
), final AS (
   SELECT
       COUNT(*)::numeric AS allStudents
       FROM unwrapper
       WHERE 
           "subject" = 'B/MATH'
)


SELECT 
   fails AS "Number of Fails",
   passes AS "Number of passes",
   allStudents AS "Number of All Students",
   ROUND((fails/allStudents *100), 2) AS "Percent of Fails",
   ROUND((passes/allStudents *100), 2) AS "Percent of Passes"
   FROM failures, passes, final;

問題是緩慢。

這個特定的查詢大約需要 10 秒才能完成,但這是一個重要的查詢,大多數使用者都會使用它,所以我很想對其進行優化。

我做的步驟

**索引:**我已經製作了一些索引,但我不確定它們是否會有所作為。

drop index if exists fk_idx_results;
drop index if exists idx_results_subjects;
drop index if exists fk_idx_schools;
drop index if exists idx_schools_name;
drop index if exists fk_idx_students;

create index fk_idx_results on results("id");
create index idx_results_subjects on results("subjects", "examYear");

create index fk_idx_schools on schools("school_number");
create index idx_schools_name on schools("school_name");

create index fk_idx_students on students("id", "subjects_id", "school_id");

設置我還在查詢之前添加了一些設置,這有點像 9 秒。

SET cpu_index_tuple_cost = .0005;
SET random_page_cost = 2;

現在問我正在尋求幫助,我是 postgresql 和整個大型數據庫優化的新手。

這是explain analyze查詢報告。

"Nested Loop  (cost=6330354.67..6330354.76 rows=1 width=160) (actual time=10103.862..10103.865 rows=1 loops=1)"
"  CTE students_query"
"    ->  Hash Join  (cost=390430.66..607236.38 rows=476181 width=347) (actual time=1378.337..3543.161 rows=458487 loops=1)"
"          Hash Cond: (students.school_id = schools.school_number)"
"          ->  Hash Join  (cost=390237.54..600495.77 rows=476181 width=326) (actual time=1376.496..3407.941 rows=458487 loops=1)"
"                Hash Cond: (students.subjects_id = results.id)"
"                ->  Seq Scan on students  (cost=0.00..93850.85 rows=4658285 width=36) (actual time=0.029..616.052 rows=4658285 loops=1)"
"                ->  Hash  (cost=362894.28..362894.28 rows=476181 width=340) (actual time=1375.636..1375.636 rows=458487 loops=1)"
"                      Buckets: 16384  Batches: 64  Memory Usage: 2876kB"
"                      ->  Seq Scan on results  (cost=0.00..362894.28 rows=476181 width=340) (actual time=0.020..1187.420 rows=458487 loops=1)"
"                            Filter: (("examYear" = '2010'::text) AND ("examType" = 'CSEE'::text))"
"                            Rows Removed by Filter: 4199798"
"          ->  Hash  (cost=113.61..113.61 rows=6361 width=33) (actual time=1.831..1.831 rows=6361 loops=1)"
"                Buckets: 8192  Batches: 1  Memory Usage: 469kB"
"                ->  Seq Scan on schools  (cost=0.00..113.61 rows=6361 width=33) (actual time=0.009..0.857 rows=6361 loops=1)"
"  CTE subjects_array"
"    ->  CTE Scan on students_query  (cost=0.00..246423.67 rows=47618100 width=32) (actual time=1378.350..4638.517 rows=3473367 loops=1)"
"  CTE unwrapper"
"    ->  Nested Loop  (cost=0.00..1904724.00 rows=47618100 width=80) (actual time=1378.369..8489.830 rows=3473367 loops=1)"
"          ->  CTE Scan on subjects_array  (cost=0.00..952362.00 rows=47618100 width=32) (actual time=1378.351..5344.083 rows=3473367 loops=1)"
"          ->  Function Scan on jsonb_to_record x  (cost=0.00..0.01 rows=1 width=80) (actual time=0.001..0.001 rows=1 loops=3473367)"
"  CTE failures"
"    ->  Aggregate  (cost=1249984.05..1249984.07 rows=1 width=32) (actual time=9379.408..9379.409 rows=1 loops=1)"
"          ->  CTE Scan on unwrapper  (cost=0.00..1249975.13 rows=3571 width=0) (actual time=1378.423..9342.868 rows=387778 loops=1)"
"                Filter: (((subject)::text = 'B/MATH'::text) AND ((grade)::text = ANY ('{D,F,X}'::text[])))"
"                Rows Removed by Filter: 3085589"
"  CTE passes"
"    ->  Aggregate  (cost=1249984.05..1249984.07 rows=1 width=32) (actual time=365.217..365.217 rows=1 loops=1)"
"          ->  CTE Scan on unwrapper unwrapper_1  (cost=0.00..1249975.13 rows=3571 width=0) (actual time=0.093..363.892 rows=24002 loops=1)"
"                Filter: (((subject)::text = 'B/MATH'::text) AND ((grade)::text = ANY ('{A,B,C}'::text[])))"
"                Rows Removed by Filter: 3449365"
"  CTE final"
"    ->  Aggregate  (cost=1072002.48..1072002.49 rows=1 width=32) (actual time=359.222..359.222 rows=1 loops=1)"
"          ->  CTE Scan on unwrapper unwrapper_2  (cost=0.00..1071407.25 rows=238090 width=0) (actual time=0.005..339.228 rows=411822 loops=1)"
"                Filter: ((subject)::text = 'B/MATH'::text)"
"                Rows Removed by Filter: 3061545"
"  ->  Nested Loop  (cost=0.00..0.05 rows=1 width=64) (actual time=9744.630..9744.632 rows=1 loops=1)"
"        ->  CTE Scan on failures  (cost=0.00..0.02 rows=1 width=32) (actual time=9379.410..9379.411 rows=1 loops=1)"
"        ->  CTE Scan on passes  (cost=0.00..0.02 rows=1 width=32) (actual time=365.219..365.220 rows=1 loops=1)"
"  ->  CTE Scan on final  (cost=0.00..0.02 rows=1 width=32) (actual time=359.224..359.225 rows=1 loops=1)"
"Planning time: 0.546 ms"
"Execution time: 10186.026 ms"

附加資訊 Postgresql 版本:9.6.1

results您可以在上面看到的表格有大約 460 萬行,其中所有行都包含subjects::jsonb 列(我猜)在那裡有很大的不同。

students表有 460 萬results行(與表完全相同),適用於所有科目結果在與 連結的結果表中的學生students.subjects_id

schools表有 6323 行,它們與students位於 的錶鍊接schools.school_number = students.school_id

範例主題列輸出。

[{"grade": "D", "subject": "HIST"}, {"grade": "D", "subject": "GEO"}, {"grade": "D", "subject": "KISW"}, {"grade": "C", "subject": "ENGL"}, {"grade": "D", "subject": "LIT ENG"}]
[{"grade": "D", "subject": "CIV"}, {"grade": "D", "subject": "GEO"}, {"grade": "D", "subject": "KISW"}, {"grade": "D", "subject": "ENGL"}]
[{"grade": "C", "subject": "CIV"}, {"grade": "D", "subject": "KISW"}, {"grade": "B", "subject": "ENGL"}, {"grade": "A", "subject": "CHEM"}, {"grade": "A", "subject": "BIO"}, {"grade": "B", "subject": "ENG SC"},{"grade": "C", "subject": "B/MATH"}, {"grade": "D", "subject": "ELECT INST"}, {"grade": "D", "subject": "ELECT ENG SC"}, {"grade": "F", "subject": "ELECT DRAUGHT"}]
[{"grade": "F", "subject": "CIV"}, {"grade": "F", "subject": "GEO"}, {"grade": "C", "subject": "E/D/KIISLAMU"}, {"grade": "F", "subject": "KISW"}, {"grade": "F", "subject": "ENGL"}, {"grade": "F", "subject": "LIT ENG"}, {"grade": "C", "subject": "ARABIC"}]
[{"grade": "F", "subject": "CIV"}, {"grade": "F", "subject": "HIST"}, {"grade": "F", "subject": "GEO"}, {"grade": "F", "subject": "KISW"}, {"grade": "F", "subject": "ENGL"}, {"grade": "F", "subject": "BIO"}, {"grade": "F", "subject": "B/MATH"}]

我同意你的結構看起來有點有趣而且沒有標準化。您的索引沒有做太多,但這是可以修復的。您可能想要索引 JSON 的元素,例如主題和等級。

由於這不是一個容易解釋的主題,您可能想查看這篇部落格文章,他在其中通過範例集進行了該操作:

http://bitnine.net/blog-postgresql/postgresql-internals-jsonb-type-and-its-indexes/?ckattempt=1

引用自:https://dba.stackexchange.com/questions/167854