Query-Performance

選擇一個連接的大型日誌表的性能問題

  • April 1, 2020

我有一個大約 1300 萬行的 Log-Tabelle。表結構(截斷)如下:

   CREATE TABLE `dmsserver_status_log` (
     `id` bigint(20) NOT NULL,
     `customer_id` int(11) NOT NULL,
     `timestamp` datetime(6) NOT NULL,
     `level` varchar(10) NOT NULL,
     `logger_name` varchar(255) NOT NULL,
     `message` text NOT NULL,
   ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  • id是 PK 和 AUTO_INCREMENT
  • customer_id和 timestamp 各有一個 KEY
  • customer_id具有“客戶”表的外鍵

客戶表結構(截斷)如下,大約有 1700 行:

   CREATE TABLE `customer` (
     `id` int(11) NOT NULL,
     `number` varchar(10) NOT NULL,
     [...]
   ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  • id是 PK 和 AUTO_INCREMENT
  • 數字是唯一鍵

這些列上有更多列和更多鍵,但我無法想像這與我的性能問題有關(?)

如果我查詢我的日誌表(沒有連接),我會在短時間內收到結果:

   SELECT
       `timestamp`,
       `level`,
       logger_name,
       message
   FROM dmsserver_status_log as log
   WHERE
       `timestamp` > '2020-03-01 00:00:00'
       AND message LIKE '%fehlgeschlagen%'
   ORDER BY
       `timestamp` ASC;

在這個例子中,我在一秒鐘內獲得了 57 次點擊。這對我來說沒問題。如果我在客戶表上使用 JOIN 擴展我的選擇,則相同的選擇需要大約 30 秒或更長時間才能完成:

   SELECT
       `timestamp`,
       `level`,
       logger_name,
       message,
       customer.`number`
   FROM dmsserver_status_log as log
       JOIN customer ON customer.id = log.customer_id
   WHERE
       `timestamp` > '2020-03-01 00:00:00'
       AND message LIKE '%fehlgeschlagen%'
   ORDER BY
       `timestamp` ASC;

根據我的查詢和時間戳的範圍,這個查詢可能需要幾分鐘或幾十分鐘。我只是不明白為什麼這個選擇需要這麼長時間才能完成。

這是帶有連接的上述選擇的“解釋擴展”輸出:

   id  select_type   `table`    `type`    possible_keys            `key`          key_len    `ref`         `rows`  filtered     Extra
   1   SIMPLE        customer   index     PRIMARY                  number         32                        1749   100          Using index; Using temporary; Using filesort
   1   SIMPLE        log        ref       customer_id,timestamp    customer_id    4          customer.id     515   100          Using where

有人可以向我解釋為什麼加入第二張桌子需要這麼多時間嗎?

編輯

按照這裡的要求,結果是SHOW CREATE TABLE dmsserver_status_log

   CREATE TABLE `dmsserver_status_log` (
     `id` bigint(20) NOT NULL AUTO_INCREMENT,
     `customer_id` int(11) NOT NULL,
     `timestamp` datetime(6) NOT NULL,
     `thread_name` varchar(255) NOT NULL,
     `level` varchar(10) NOT NULL,
     `logger_name` varchar(255) NOT NULL,
     `source_host` varchar(50) NOT NULL,
     `message` text NOT NULL,
     `stacktrace` text,
     `mdc` text,
     PRIMARY KEY (`id`),
     KEY `customer_id` (`customer_id`),
     KEY `timestamp` (`timestamp`),
     CONSTRAINT `dmsserver_status_log_ibfk_1` FOREIGN KEY (`customer_id`) REFERENCES `customer` (`id`) ON UPDATE CASCADE
   ) ENGINE=InnoDB AUTO_INCREMENT=17996215 DEFAULT CHARSET=utf8

這是結果SHOW CREATE TABLE customer

   CREATE TABLE `customer` (
     `id` int(11) NOT NULL AUTO_INCREMENT,
     `number` varchar(10) NOT NULL,
     `headquarter_customer_number` varchar(10) DEFAULT NULL COMMENT 'Angabe des Hauptbetriebs wenn dies eine Filiale ist',
     `it_location_customer_number` varchar(10) DEFAULT NULL COMMENT 'Angabe des EDV-Standorts wenn dieser Kunde keine eigene EDV hat',
     `customer_type_id` int(11) NOT NULL DEFAULT '1',
     `address_id` int(11) NOT NULL,
     `characteristic_id` int(11) NOT NULL DEFAULT '1',
     `iban` varchar(34) DEFAULT NULL,
     `bic` varchar(11) DEFAULT NULL,
     `bank` varchar(100) DEFAULT NULL,
     `newsletter` tinyint(1) NOT NULL DEFAULT '1',
     `newsletter_email` varchar(100) DEFAULT NULL,
     `invoice_email` varchar(100) DEFAULT NULL COMMENT 'Email-Adresse für den Empfang von Rechnungen als PDF',
     `api_password` varchar(255) DEFAULT NULL COMMENT 'Zugangspasswort für den Zugriff auf unsere API',
     `comments` varchar(250) DEFAULT NULL,
     `valid_id` smallint(6) NOT NULL,
     `create_time` datetime NOT NULL,
     `create_by` int(11) NOT NULL,
     `change_time` datetime NOT NULL,
     `change_by` int(11) NOT NULL,
     PRIMARY KEY (`id`),
     UNIQUE KEY `number` (`number`),
     KEY `change_by` (`change_by`),
     KEY `characteristic_id` (`characteristic_id`),
     KEY `customer_type_id` (`customer_type_id`),
     KEY `address_id` (`address_id`),
     KEY `valid_id` (`valid_id`),
     KEY `create_by` (`create_by`),
     KEY `headquarter_customer_number` (`headquarter_customer_number`),
     KEY `it_location_customer_number` (`it_location_customer_number`),
     CONSTRAINT `customer_ibfk_1` FOREIGN KEY (`customer_type_id`) REFERENCES `customer_type` (`id`),
     CONSTRAINT `customer_ibfk_2` FOREIGN KEY (`address_id`) REFERENCES `address` (`id`),
     CONSTRAINT `customer_ibfk_3` FOREIGN KEY (`characteristic_id`) REFERENCES `customer_characteristic` (`id`),
     CONSTRAINT `customer_ibfk_4` FOREIGN KEY (`create_by`) REFERENCES `users` (`id`) ON UPDATE CASCADE,
     CONSTRAINT `customer_ibfk_5` FOREIGN KEY (`change_by`) REFERENCES `users` (`id`) ON UPDATE CASCADE,
     CONSTRAINT `customer_ibfk_6` FOREIGN KEY (`headquarter_customer_number`) REFERENCES `customer` (`number`) ON UPDATE CASCADE,
     CONSTRAINT `customer_ibfk_7` FOREIGN KEY (`it_location_customer_number`) REFERENCES `customer` (`number`) ON UPDATE CASCADE
   ) ENGINE=InnoDB AUTO_INCREMENT=2540 DEFAULT CHARSET=utf8

請提供EXPLAIN SELECT ...每個變體。

什麼版本的 MariaDB?(這方面的優化已經改變。)

也嘗試這些變化

這將過濾隔離到派生表中,從而最大限度地減少進入另一個表的需要。

SELECT  log.`timestamp`, log.`level`, log.logger_name, log.message,
       c.`number`
   FROM (
       SELECT  log1.customer_id, log1.`timestamp`,
               log1.`level`, log1.logger_name,
               log1.message
           FROM  dmsserver_status_log AS log1 
           WHERE  `timestamp` > '2020-03-01 00:00:00'
             AND  message LIKE '%fehlgeschlagen%'
        ) AS log
   JOIN  customer AS c  ON c.id = log.customer_id
   ORDER BY  log.`timestamp` ASC;

這個推遲以不同的方式查看第二張桌子。

SELECT   `timestamp`, `level`, logger_name, message,
        ( SELECT `number` FROM customer
                    WHERE customer.id = log.customer_id ) AS "number"
   FROM  dmsserver_status_log as log
   JOIN  customer  ON customer.id = log.customer_id
   WHERE  `timestamp` > '2020-03-01 00:00:00'
     AND  message LIKE '%fehlgeschlagen%'
   ORDER BY  `timestamp` ASC;

根據 MySQL/MariaDB 的版本,這三種變體可能會進行不同的優化。

如果您正在搜尋“單詞”,請考慮使用FULLTEXT(message)and MATCH(message) AGAINST ('+fehlgeschlagen' IN BOOLEAN MODE)。它應該執行得更快。

您在表上customer_id和表中都有單獨的索引(又名鍵)。對於任何給定的查詢,MariaDB 通常只能使用一個索引。因此,當您更改查詢以包含連接時,MariaDB 必須選擇要使用的索引之一。 timestamp``dmsserver_status_log

相反,您需要的是表上一個合適的複合索引,它可以同時支持子句、the和列中的dmsserver_status_log相關條件。WHERE``JOIN``ORDER BY

所以你需要的索引可能是CREATE INDEX dmsserver_status_log_idx1 ON (customer_id, timestamp). 請注意,列的順序很重要。(如果EXPLAIN顯示優化器不會使用此索引,那麼您可以嘗試使用相反順序的列重新創建它。查詢優化可能很困難……)

請注意,message也出現在您的WHERE子句中的列不會從具有特定查詢的複合索引中受益,因為您在搜尋字元串之前有一個萬用字元。

如果您只關心特定 JOIN 查詢的良好性能,customer_id您可能需要考慮刪除現有的單個索引。timestamp在一個表中維護大量的索引意味著更慢INSERT的 s 並且它們也佔用了一些記憶體。

作為參考,這裡有一篇關於為 MariaDB/MySQL 建構索引的文章: Building the best INDEX for a given SELECT

引用自:https://dba.stackexchange.com/questions/262214