Index

索引的非最佳使用

  • April 21, 2016

我對下面的場景有些困惑。涉及的表格(相關部分)定義為:

CREATE TABLE Nya. UPSEC_COURSE (
   UPSEC_COURSE_ID             CHAR(11) NOT NULL,
   CORE_UPSEC_SUBJECT          SMALLINT NOT NULL WITH DEFAULT 0,
   POINT                       SMALLINT NOT NULL WITH DEFAULT 0,
   CONVERTEDPOINT              SMALLINT,
   UPSEC_COURSETYPE_ID         CHAR(5) NOT NULL with default 'N',
   UPSEC_COURSE_SHORT          CHAR(20) NOT NULL with default,
   EQ_UPSEC_SUBJECT_ID         CHAR(5),
   UPSEC_MARKSCALE_ID          SMALLINT NOT NULL,
   SUBMIT_BY                   VARCHAR(130) NOT NULL , 
   SUBMIT_TIME                 TIMESTAMP NOT NULL WITH EFAULT     CURRENT TIMESTAMP, 
   UPSEC_COURSE                VARCHAR(64) NOT NULL with default
) IN USERSPACE1 @

CREATE UNIQUE INDEX NYA.XPK_UPSEC_COURSE ON NYA.UPSEC_COURSE 
   (UPSEC_COURSE_ID) INCLUDE (UPSEC_COURSE, UPSEC_MARKSCALE_ID, EQ_UPSEC_SUBJECT_ID
                         ,UPSEC_COURSETYPE_ID, CONVERTEDPOINT, CORE_UPSEC_SUBJECT)
ALLOW REVERSE SCANS 
COLLECT SAMPLED DETAILED STATISTICS @

所關注的查詢是:

Original Statement:
------------------
SELECT 
 RTRIM(UC.EQ_UPSEC_SUBJECT_ID) AS EQ_UPSEC_SUBJECT_ID,
 LEV.UPSEC_SUBJECTLEVEL_ID,
 RTRIM(UC.UPSEC_COURSETYPE_ID) 
FROM nya.UPSEC_COURSE UC 
JOIN nya.UPSEC_SUBJECTLEVEL_COURSE LEV 
  ON UC.UPSEC_COURSE_ID = LEV.UPSEC_COURSE_ID 
WHERE LEV.UPSEC_COURSE_ID = ? 
 and UC.UPSEC_COURSETYPE_ID in (?, ?) 
 AND EXISTS (
     SELECT 1 
     FROM NYA.UPSEC_COURSE UC2 
     JOIN NYA.UPSEC_COURSE_WEIGHTED UCW 
         ON UC2.UPSEC_COURSE_ID = UCW.UPSEC_COURSE_ID 
     WHERE UC2.EQ_UPSEC_SUBJECT_ID = UC.EQ_UPSEC_SUBJECT_ID
 )

所有索引和表都被重組並具有更新的統計資訊。

由於散列連接,查詢每次執行讀取大約 10000 行(我用這些數字中的常量替換了參數標記)

NUM_EXECUTIONS       ROWS_READ            POOL_DATA_L_READS    POOL_INDEX_L_READS  
-------------------- -------------------- -------------------- --------------------
                  5                50535                 1415                  145

Access Plan:
-----------
       Total Cost:             665.512
       Query Degree:           1

                                 Rows 
                                RETURN
                                (   1)
                                 Cost 
                                  I/O 
                                  |
                                   1 
                                TBSCAN
                                (   2)
                                665.512 
                                  296 
                                  |
                                   1 
                                SORT  
                                (   3)
                                665.512 
                                  296 
                                  |
                                6.04348 
                                ^HSJOIN
                                (   4)
                                665.511 
                                  296 
                    /-------------+-------------\
                6.04348                            1 
                HSJOIN^                         IXSCAN
                (   5)                          (  10)
                652.652                         12.8584 
                  295                              1 
        /---------+---------\                     |
     10014                   2641                2663 
    TBSCAN                  NLJOIN          INDEX: NYA     
    (   6)                  (   7)   XPK_UPSEC_SUBJECTLEVEL_COURSE
    482.247                 169.719               Q6
      271                     24 
      |              /--------+--------\
     10014          1                   2641 
TABLE: NYA       IXSCAN                IXSCAN
 UPSEC_COURSE    (   8)                (   9)
      Q5         25.7102               144.009 
                    2                    22 
                   |                     |
                  10014                 2641 
             INDEX: NYA            INDEX: NYA     
            XPK_UPSEC_COURSE  XPK_UPSEC_COURSE_WEIGHTED
                   Q7                    Q4

如前所述,表和索引已重組並更新了統計資訊。如果我創建一個幾乎相同的索引

CREATE UNIQUE INDEX TMP.XPK_UPSEC_COURSE ON NYA.UPSEC_COURSE 
(UPSEC_COURSE_ID) INCLUDE (UPSEC_COURSE, UPSEC_MARKSCALE_ID, EQ_UPSEC_SUBJECT_ID
                         ,UPSEC_COURSETYPE_ID, CONVERTEDPOINT, CORE_UPSEC_SUBJECT, SUBMIT_BY)
ALLOW REVERSE SCANS 
COLLECT SAMPLED DETAILED STATISTICS @

即唯一的區別是我在 INCLUDE 中包含了完全不相關的 SUBMIT 列,我得到了更好的數字:

NUM_EXECUTIONS       ROWS_READ            POOL_DATA_L_READS    POOL_INDEX_L_READS  
-------------------- -------------------- -------------------- --------------------
                  5                    0                    0                  125 

計劃改為:

                         Rows 
                         RETURN
                         (   1)
                          Cost 
                           I/O 
                           |
                            1 
                         NLJOIN
                         (   2)
                         123.687 
                         34.985 
                /----------+----------\
               1                         1 
            IXSCAN                    ^NLJOIN
            (   3)                    (   5)
            12.8584                   110.829 
               1                      33.985 
              |                     /---+----\
             2663                  1         6.14874 
        INDEX: NYA              IXSCAN       FILTER
 XPK_UPSEC_SUBJECTLEVEL_COURSE  (   6)       (   7)
              Q5                25.7101      609.94 
                                   2         229.197 
                                  |            |
                                 10107        2687 
                            INDEX: NYA       ^MSJOIN
                           XPK_UPSEC_COURSE  (   8)
                                  Q6         609.77 
                                             229.197 
                                        /------+------\
                                     2687                1 
                                    IXSCAN            FILTER
                                    (   9)            (  10)
                                    144.02            471.243 
                                      22                210 
                                      |                 |
                                     2687              10107 
                                INDEX: NYA            IXSCAN
                           XPK_UPSEC_COURSE_WEIGHTED  (  11)
                                      Q3              471.243 
                                                        210 
                                                        |
                                                       10107 
                                                  INDEX: TMP     
                                                 XPK_UPSEC_COURSE
                                                        Q4

如果我使用原始索引定義在另一個模式中創建表,我也會得到這個很好的行為。

問題是為什麼 db2 堅持以它的方式使用索引?我覺得索引不健康,應該刪除並重新創建。唯一的問題是大約 50 個外鍵引用了索引支持的 pk,所以我寧願避免這種情況。除了重組(我嘗試過),可以做些什麼來改變計劃?

我注意到INDEXREC = RESTART,因此將索引標記為無效可能要等到重新啟動才會生效

有什麼想法嗎?

編輯:在表 nya.upsec_course 上添加了 db2 reorgchk 的結果

Table statistics:

F1: 100 * OVERFLOW / CARD < 5
F2: 100 * (Effective Space Utilization of Data Pages) > 70
F3: 100 * (Required Pages / Total Pages) > 80

SCHEMA.NAME                     CARD     OV     NP     FP ACTBLK    TSIZE  F1  F2  F3 REORG
----------------------------------------------------------------------------------------
Table: NYA.UPSEC_COURSE
                              10107      0    273    273      -  1081449   0 100 100 --- 
----------------------------------------------------------------------------------------

Index statistics:

F4: CLUSTERRATIO or normalized CLUSTERFACTOR > 80
F5: 100 * (Space used on leaf pages / Space available on non-empty leaf pages) > MIN(50, (100 - PCTFREE))
F6: (100 - PCTFREE) * (Amount of space available in an index with one less level / Amount of space required for all keys) < 100
F7: 100 * (Number of pseudo-deleted RIDs / Total number of RIDs) < 20
F8: 100 * (Number of pseudo-empty leaf pages / Total number of leaf pages) < 20

SCHEMA.NAME                 INDCARD  LEAF ELEAF LVLS  NDEL    KEYS LEAF_RECSIZE NLEAF_RECSIZE LEAF_PAGE_OVERHEAD NLEAF_PAGE_OVERHEAD  PCT_PAGES_SAVED  F4  F5  F6  F7  F8 REORG  
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Table: NYA.UPSEC_COURSE
Index: NYA.XAK1_UPSEC_COURSE
                             10107    77     0    2     0   10107           16            16                416                 416                0  87  93   -   0   0 ----- 
Index: NYA.XPK_UPSEC_COURSE
                             10107   168     0    3     0   10107           47            11                296                 496                0  87  90  73   0   0 ----- 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

我已經好幾年沒用過 reorgchk(我通常只看溢出的數量來確定是否需要 reorg),但據我所知,這些數字看起來足夠好。

有關所涉及的其他表的詳細資訊。UPSEC_COURSE_WEIGHTED 是 UPSEC_COURSE 的子集:

CREATE TABLE NYA.UPSEC_COURSE_WEIGHTED (
   UPSEC_COURSE_ID CHAR(11) NOT NULL
) IN USERSPACE1 @

CREATE UNIQUE INDEX NYA.XPK_UPSEC_COURSE_WEIGHTED ON NYA.UPSEC_COURSE_WEIGHTED
   ( UPSEC_COURSE_ID ) 
ALLOW REVERSE SCANS 
COLLECT SAMPLED DETAILED STATISTICS @  

ALTER TABLE NYA.UPSEC_COURSE_WEIGHTED ADD CONSTRAINT XPK_UPSEC_COURSE_WEIGHTED
   PRIMARY KEY ( UPSEC_COURSE_ID ) @

ALTER TABLE NYA.UPSEC_COURSE_WEIGHTED ADD CONSTRAINT XFK_UPSEC_COURSE
   FOREIGN KEY (UPSEC_COURSE_ID)
   REFERENCES NYA.UPSEC_COURSE (UPSEC_COURSE_ID)
       ON DELETE CASCADE
       ON UPDATE RESTRICT @

表 nya.upsec_course_weighted 的重組

F1: 100 * OVERFLOW / CARD < 5
F2: 100 * (Effective Space Utilization of Data Pages) > 70
F3: 100 * (Required Pages / Total Pages) > 80

SCHEMA.NAME                     CARD     OV     NP     FP ACTBLK    TSIZE  F1  F2  F3 REORG
----------------------------------------------------------------------------------------
Table: NYA.UPSEC_COURSE_WEIGHTED
                               2687      0     15     15      -    56427   0 100 100 --- 
----------------------------------------------------------------------------------------

Index statistics:

F4: CLUSTERRATIO or normalized CLUSTERFACTOR > 80
F5: 100 * (Space used on leaf pages / Space available on non-empty leaf pages) > MIN(50, (100 - PCTFREE))
F6: (100 - PCTFREE) * (Amount of space available in an index with one less level / Amount of space required for all keys) < 100
F7: 100 * (Number of pseudo-deleted RIDs / Total number of RIDs) < 20
F8: 100 * (Number of pseudo-empty leaf pages / Total number of leaf pages) < 20

SCHEMA.NAME                 INDCARD  LEAF ELEAF LVLS  NDEL    KEYS LEAF_RECSIZE NLEAF_RECSIZE LEAF_PAGE_OVERHEAD NLEAF_PAGE_OVERHEAD  PCT_PAGES_SAVED  F4  F5  F6  F7  F8 REORG  
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Table: NYA.UPSEC_COURSE_WEIGHTED
Index: NYA.XPK_UPSEC_COURSE_WEIGHTED
                              2687    22     0    2    12    2687           11            11                496                 496                0  98  74   -   0   0 ----- 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

FWIW,我嘗試調查數據庫副本中新創建的索引與 syscat.indexes、sysibm.sysindexes 和 sysstat.indexes 中的原始索引之間的差異。與舊指數(87)相比,我能發現的唯一區別是新指數的密度(100)略高。

因此,我最終重新創建了索引,現在執行計劃要好得多。我創建了一個小的 python 腳本來為指向表的 fk 生成 drop 和 create 語句。我手頭有一個舊的 DB2.py 驅動程序,因此以下程式碼段無法與 10.5 附帶的官方 python 驅動程序一起使用。儘管如此,它應該很容易採用,所以無論如何我都會添加程式碼:

#!/usr/bin/python

import sys
import getopt
import DB2

def main():
   try:
       opts, args = getopt.getopt(sys.argv[1:], "d:t:s:u:p:")
   except getopt.GetoptError:
       sys.exit(-1)
   for o, a in opts:
       if (o == "-d"):
           dbname = a
       if (o == "-t"):
           tables = a.split(',')
       if (o == "-s"):
           schema = a
       if (o == "-u"):
           user = a
       if (o == "-p"):
           pwd = a

   conn = DB2.connect(dsn=dbname, uid=user, pwd=pwd)
   fk_sql = """
       select tabschema, tabname, constname, FK_COLNAMES,
              reftabschema, reftabname, PK_COLNAMES,
              case DELETERULE WHEN 'C' then 'CASCADE' 
                   WHEN 'R' then 'RESTRICT' 
                   ELSE 'NO ACTION' end,
              case UPDATERULE WHEN 'C' then 'CASCADE'  
                   WHEN 'R' then 'RESTRICT'  
                   ELSE 'NO ACTION' end
       from syscat.references 
       where reftabname = ? and reftabschema = ?"""
   c1 = conn.cursor()
   c2 = conn.cursor()

   create_stmts = []
   drop_stmts = []

   for t in tables:
       c1.execute(fk_sql, (t,schema))
       restore_sql = ''
       for row in c1.fetchall():
           tabscema    = row[0]
           tabname     = row[1]
           constname   = row[2]
           FK_COLNAMES = row[3]
           reftabschema= row[4]
           reftabname  = row[5]
           PK_COLNAMES = row[6]
           DELETERULE  = row[7]
           UPDATERULE  = row[8]

           fk = filter ((lambda x:x<>''), FK_COLNAMES.split(' '))
           cols = ''
           for c in fk: cols = cols + ',' + c
           fkcols = cols[1:]

           pk = filter ((lambda x:x<>''), PK_COLNAMES.split(' '))
           cols = ''
           for c in pk: cols = cols + ',' + c
           pkcols = cols[1:]

           create = """
               alter table %s.%s add constraint %s
                   foreign key (%s)
                       references %s.%s (%s)
                           on update %s
                           on delete %s;
           """ % (tabscema,tabname,constname,fkcols,reftabschema, reftabname,pkcols,UPDATERULE,DELETERULE)


           drop = """
               alter table %s.%s drop constraint %s;
           """ % (tabscema,tabname,constname)

           create_stmts.append(create)
           drop_stmts.append(drop)

       conn.rollback()
       for x in drop_stmts:
           print x

       for x in create_stmts:
           print x

if __name__ == "__main__":
   main()

引用自:https://dba.stackexchange.com/questions/135617