索引的非最佳使用
我對下面的場景有些困惑。涉及的表格(相關部分)定義為:
CREATE TABLE Nya. UPSEC_COURSE ( UPSEC_COURSE_ID CHAR(11) NOT NULL, CORE_UPSEC_SUBJECT SMALLINT NOT NULL WITH DEFAULT 0, POINT SMALLINT NOT NULL WITH DEFAULT 0, CONVERTEDPOINT SMALLINT, UPSEC_COURSETYPE_ID CHAR(5) NOT NULL with default 'N', UPSEC_COURSE_SHORT CHAR(20) NOT NULL with default, EQ_UPSEC_SUBJECT_ID CHAR(5), UPSEC_MARKSCALE_ID SMALLINT NOT NULL, SUBMIT_BY VARCHAR(130) NOT NULL , SUBMIT_TIME TIMESTAMP NOT NULL WITH EFAULT CURRENT TIMESTAMP, UPSEC_COURSE VARCHAR(64) NOT NULL with default ) IN USERSPACE1 @ CREATE UNIQUE INDEX NYA.XPK_UPSEC_COURSE ON NYA.UPSEC_COURSE (UPSEC_COURSE_ID) INCLUDE (UPSEC_COURSE, UPSEC_MARKSCALE_ID, EQ_UPSEC_SUBJECT_ID ,UPSEC_COURSETYPE_ID, CONVERTEDPOINT, CORE_UPSEC_SUBJECT) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS @
所關注的查詢是:
Original Statement: ------------------ SELECT RTRIM(UC.EQ_UPSEC_SUBJECT_ID) AS EQ_UPSEC_SUBJECT_ID, LEV.UPSEC_SUBJECTLEVEL_ID, RTRIM(UC.UPSEC_COURSETYPE_ID) FROM nya.UPSEC_COURSE UC JOIN nya.UPSEC_SUBJECTLEVEL_COURSE LEV ON UC.UPSEC_COURSE_ID = LEV.UPSEC_COURSE_ID WHERE LEV.UPSEC_COURSE_ID = ? and UC.UPSEC_COURSETYPE_ID in (?, ?) AND EXISTS ( SELECT 1 FROM NYA.UPSEC_COURSE UC2 JOIN NYA.UPSEC_COURSE_WEIGHTED UCW ON UC2.UPSEC_COURSE_ID = UCW.UPSEC_COURSE_ID WHERE UC2.EQ_UPSEC_SUBJECT_ID = UC.EQ_UPSEC_SUBJECT_ID )
所有索引和表都被重組並具有更新的統計資訊。
由於散列連接,查詢每次執行讀取大約 10000 行(我用這些數字中的常量替換了參數標記)
NUM_EXECUTIONS ROWS_READ POOL_DATA_L_READS POOL_INDEX_L_READS -------------------- -------------------- -------------------- -------------------- 5 50535 1415 145 Access Plan: ----------- Total Cost: 665.512 Query Degree: 1 Rows RETURN ( 1) Cost I/O | 1 TBSCAN ( 2) 665.512 296 | 1 SORT ( 3) 665.512 296 | 6.04348 ^HSJOIN ( 4) 665.511 296 /-------------+-------------\ 6.04348 1 HSJOIN^ IXSCAN ( 5) ( 10) 652.652 12.8584 295 1 /---------+---------\ | 10014 2641 2663 TBSCAN NLJOIN INDEX: NYA ( 6) ( 7) XPK_UPSEC_SUBJECTLEVEL_COURSE 482.247 169.719 Q6 271 24 | /--------+--------\ 10014 1 2641 TABLE: NYA IXSCAN IXSCAN UPSEC_COURSE ( 8) ( 9) Q5 25.7102 144.009 2 22 | | 10014 2641 INDEX: NYA INDEX: NYA XPK_UPSEC_COURSE XPK_UPSEC_COURSE_WEIGHTED Q7 Q4
如前所述,表和索引已重組並更新了統計資訊。如果我創建一個幾乎相同的索引
CREATE UNIQUE INDEX TMP.XPK_UPSEC_COURSE ON NYA.UPSEC_COURSE (UPSEC_COURSE_ID) INCLUDE (UPSEC_COURSE, UPSEC_MARKSCALE_ID, EQ_UPSEC_SUBJECT_ID ,UPSEC_COURSETYPE_ID, CONVERTEDPOINT, CORE_UPSEC_SUBJECT, SUBMIT_BY) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS @
即唯一的區別是我在 INCLUDE 中包含了完全不相關的 SUBMIT 列,我得到了更好的數字:
NUM_EXECUTIONS ROWS_READ POOL_DATA_L_READS POOL_INDEX_L_READS -------------------- -------------------- -------------------- -------------------- 5 0 0 125
計劃改為:
Rows RETURN ( 1) Cost I/O | 1 NLJOIN ( 2) 123.687 34.985 /----------+----------\ 1 1 IXSCAN ^NLJOIN ( 3) ( 5) 12.8584 110.829 1 33.985 | /---+----\ 2663 1 6.14874 INDEX: NYA IXSCAN FILTER XPK_UPSEC_SUBJECTLEVEL_COURSE ( 6) ( 7) Q5 25.7101 609.94 2 229.197 | | 10107 2687 INDEX: NYA ^MSJOIN XPK_UPSEC_COURSE ( 8) Q6 609.77 229.197 /------+------\ 2687 1 IXSCAN FILTER ( 9) ( 10) 144.02 471.243 22 210 | | 2687 10107 INDEX: NYA IXSCAN XPK_UPSEC_COURSE_WEIGHTED ( 11) Q3 471.243 210 | 10107 INDEX: TMP XPK_UPSEC_COURSE Q4
如果我使用原始索引定義在另一個模式中創建表,我也會得到這個很好的行為。
問題是為什麼 db2 堅持以它的方式使用索引?我覺得索引不健康,應該刪除並重新創建。唯一的問題是大約 50 個外鍵引用了索引支持的 pk,所以我寧願避免這種情況。除了重組(我嘗試過),可以做些什麼來改變計劃?
我注意到INDEXREC = RESTART,因此將索引標記為無效可能要等到重新啟動才會生效
有什麼想法嗎?
編輯:在表 nya.upsec_course 上添加了 db2 reorgchk 的結果
Table statistics: F1: 100 * OVERFLOW / CARD < 5 F2: 100 * (Effective Space Utilization of Data Pages) > 70 F3: 100 * (Required Pages / Total Pages) > 80 SCHEMA.NAME CARD OV NP FP ACTBLK TSIZE F1 F2 F3 REORG ---------------------------------------------------------------------------------------- Table: NYA.UPSEC_COURSE 10107 0 273 273 - 1081449 0 100 100 --- ---------------------------------------------------------------------------------------- Index statistics: F4: CLUSTERRATIO or normalized CLUSTERFACTOR > 80 F5: 100 * (Space used on leaf pages / Space available on non-empty leaf pages) > MIN(50, (100 - PCTFREE)) F6: (100 - PCTFREE) * (Amount of space available in an index with one less level / Amount of space required for all keys) < 100 F7: 100 * (Number of pseudo-deleted RIDs / Total number of RIDs) < 20 F8: 100 * (Number of pseudo-empty leaf pages / Total number of leaf pages) < 20 SCHEMA.NAME INDCARD LEAF ELEAF LVLS NDEL KEYS LEAF_RECSIZE NLEAF_RECSIZE LEAF_PAGE_OVERHEAD NLEAF_PAGE_OVERHEAD PCT_PAGES_SAVED F4 F5 F6 F7 F8 REORG ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Table: NYA.UPSEC_COURSE Index: NYA.XAK1_UPSEC_COURSE 10107 77 0 2 0 10107 16 16 416 416 0 87 93 - 0 0 ----- Index: NYA.XPK_UPSEC_COURSE 10107 168 0 3 0 10107 47 11 296 496 0 87 90 73 0 0 ----- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
我已經好幾年沒用過 reorgchk(我通常只看溢出的數量來確定是否需要 reorg),但據我所知,這些數字看起來足夠好。
有關所涉及的其他表的詳細資訊。UPSEC_COURSE_WEIGHTED 是 UPSEC_COURSE 的子集:
CREATE TABLE NYA.UPSEC_COURSE_WEIGHTED ( UPSEC_COURSE_ID CHAR(11) NOT NULL ) IN USERSPACE1 @ CREATE UNIQUE INDEX NYA.XPK_UPSEC_COURSE_WEIGHTED ON NYA.UPSEC_COURSE_WEIGHTED ( UPSEC_COURSE_ID ) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS @ ALTER TABLE NYA.UPSEC_COURSE_WEIGHTED ADD CONSTRAINT XPK_UPSEC_COURSE_WEIGHTED PRIMARY KEY ( UPSEC_COURSE_ID ) @ ALTER TABLE NYA.UPSEC_COURSE_WEIGHTED ADD CONSTRAINT XFK_UPSEC_COURSE FOREIGN KEY (UPSEC_COURSE_ID) REFERENCES NYA.UPSEC_COURSE (UPSEC_COURSE_ID) ON DELETE CASCADE ON UPDATE RESTRICT @
表 nya.upsec_course_weighted 的重組
F1: 100 * OVERFLOW / CARD < 5 F2: 100 * (Effective Space Utilization of Data Pages) > 70 F3: 100 * (Required Pages / Total Pages) > 80 SCHEMA.NAME CARD OV NP FP ACTBLK TSIZE F1 F2 F3 REORG ---------------------------------------------------------------------------------------- Table: NYA.UPSEC_COURSE_WEIGHTED 2687 0 15 15 - 56427 0 100 100 --- ---------------------------------------------------------------------------------------- Index statistics: F4: CLUSTERRATIO or normalized CLUSTERFACTOR > 80 F5: 100 * (Space used on leaf pages / Space available on non-empty leaf pages) > MIN(50, (100 - PCTFREE)) F6: (100 - PCTFREE) * (Amount of space available in an index with one less level / Amount of space required for all keys) < 100 F7: 100 * (Number of pseudo-deleted RIDs / Total number of RIDs) < 20 F8: 100 * (Number of pseudo-empty leaf pages / Total number of leaf pages) < 20 SCHEMA.NAME INDCARD LEAF ELEAF LVLS NDEL KEYS LEAF_RECSIZE NLEAF_RECSIZE LEAF_PAGE_OVERHEAD NLEAF_PAGE_OVERHEAD PCT_PAGES_SAVED F4 F5 F6 F7 F8 REORG ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Table: NYA.UPSEC_COURSE_WEIGHTED Index: NYA.XPK_UPSEC_COURSE_WEIGHTED 2687 22 0 2 12 2687 11 11 496 496 0 98 74 - 0 0 ----- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
FWIW,我嘗試調查數據庫副本中新創建的索引與 syscat.indexes、sysibm.sysindexes 和 sysstat.indexes 中的原始索引之間的差異。與舊指數(87)相比,我能發現的唯一區別是新指數的密度(100)略高。
因此,我最終重新創建了索引,現在執行計劃要好得多。我創建了一個小的 python 腳本來為指向表的 fk 生成 drop 和 create 語句。我手頭有一個舊的 DB2.py 驅動程序,因此以下程式碼段無法與 10.5 附帶的官方 python 驅動程序一起使用。儘管如此,它應該很容易採用,所以無論如何我都會添加程式碼:
#!/usr/bin/python import sys import getopt import DB2 def main(): try: opts, args = getopt.getopt(sys.argv[1:], "d:t:s:u:p:") except getopt.GetoptError: sys.exit(-1) for o, a in opts: if (o == "-d"): dbname = a if (o == "-t"): tables = a.split(',') if (o == "-s"): schema = a if (o == "-u"): user = a if (o == "-p"): pwd = a conn = DB2.connect(dsn=dbname, uid=user, pwd=pwd) fk_sql = """ select tabschema, tabname, constname, FK_COLNAMES, reftabschema, reftabname, PK_COLNAMES, case DELETERULE WHEN 'C' then 'CASCADE' WHEN 'R' then 'RESTRICT' ELSE 'NO ACTION' end, case UPDATERULE WHEN 'C' then 'CASCADE' WHEN 'R' then 'RESTRICT' ELSE 'NO ACTION' end from syscat.references where reftabname = ? and reftabschema = ?""" c1 = conn.cursor() c2 = conn.cursor() create_stmts = [] drop_stmts = [] for t in tables: c1.execute(fk_sql, (t,schema)) restore_sql = '' for row in c1.fetchall(): tabscema = row[0] tabname = row[1] constname = row[2] FK_COLNAMES = row[3] reftabschema= row[4] reftabname = row[5] PK_COLNAMES = row[6] DELETERULE = row[7] UPDATERULE = row[8] fk = filter ((lambda x:x<>''), FK_COLNAMES.split(' ')) cols = '' for c in fk: cols = cols + ',' + c fkcols = cols[1:] pk = filter ((lambda x:x<>''), PK_COLNAMES.split(' ')) cols = '' for c in pk: cols = cols + ',' + c pkcols = cols[1:] create = """ alter table %s.%s add constraint %s foreign key (%s) references %s.%s (%s) on update %s on delete %s; """ % (tabscema,tabname,constname,fkcols,reftabschema, reftabname,pkcols,UPDATERULE,DELETERULE) drop = """ alter table %s.%s drop constraint %s; """ % (tabscema,tabname,constname) create_stmts.append(create) drop_stmts.append(drop) conn.rollback() for x in drop_stmts: print x for x in create_stmts: print x if __name__ == "__main__": main()