Mysql

Mysql全文搜尋my.cnf優化

  • August 10, 2017

我在https://serverfault.com/questions/353888/mysql-full-text-search-cause-high-usage-cpu上提出了一個問題,一些使用者建議在這裡提問。

我們建立了一個新聞網站。每天我們都會從 web api 輸入數以萬計的數據。

為了提供精準的搜尋服務,我們的表格使用了MyISAM,建立了全文索引(標題、內容、日期)。我們的網站正在使用 2GB RAM、30GB 空間(無交換,因為 VDS 不允許建構交換)的 Godaddy VDS 上進行測試。CPU是Intel(R) Xeon(R) CPU L5609 @ 1.87GHz

執行後./mysqltuner.pl

我們得到一些結果:

-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.5.20
[OK] Operating on 32-bit architecture with less than 2GB RAM

-------- Storage Engine Statistics -------------------------------------------
[--] Status: -Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 396M (Tables: 39)
[--] Data in InnoDB tables: 208K (Tables: 8)
[!!] Total fragmented tables: 9

-------- Security Recommendations  -------------------------------------------
[!!] User '@ip-XX-XX-XX-XX.ip.secureserver.net'
[!!] User '@localhost'

-------- Performance Metrics -------------------------------------------------
[--] Up for: 17h 27m 58s (1M q [20.253 qps], 31K conn, TX: 513M, RX: 303M)
[--] Reads / Writes: 61% / 39%
[--] Total buffers: 168.0M global + 2.7M per thread (151 max threads)
[OK] Maximum possible memory usage: 573.8M (28% of installed RAM)
[OK] Slow queries: 0% (56/1M)
[!!] Highest connection usage: 100%  (152/151)
[OK] Key buffer size / total MyISAM indexes: 8.0M/162.5M
[OK] Key buffer hit rate: 100.0% (2B cached / 882K reads)
[!!] Query cache is disabled
[OK] Sorts requiring temporary tables: 0% (0 temp sorts / 17K sorts)
[!!] Temporary tables created on disk: 49% (32K on disk / 64K total)
[!!] Thread cache is disabled
[!!] Table cache hit rate: 0% (400 open / 298K opened)
[OK] Open file limit used: 41% (421/1K)
[!!] Table locks acquired immediately: 77%
[OK] InnoDB data size / buffer pool: 208.0K/128.0M

-------- Recommendations -----------------------------------------------------
General recommendations:
   Run OPTIMIZE TABLE to defragment tables for better performance
   MySQL started within last 24 hours - recommendations may be inaccurate
   Enable the slow query log to troubleshoot bad queries
   Reduce or eliminate persistent connections to reduce connection usage
   When making adjustments, make tmp_table_size/max_heap_table_size equal
   Reduce your SELECT DISTINCT queries without LIMIT clauses
   Set thread_cache_size to 4 as a starting value
   Increase table_cache gradually to avoid file descriptor limits
   Optimize queries and/or use InnoDB to reduce lock wait
Variables to adjust:
   max_connections (> 151)
   wait_timeout (< 28800)
   interactive_timeout (< 28800)
   query_cache_size (>= 8M)
   tmp_table_size (> 16M)
   max_heap_table_size (> 16M)
   thread_cache_size (start at 4)
   table_cache (> 400)

這是my.cnf

[mysqld]
port            = 3306
socket          = /tmp/mysql.sock
skip-external-locking
key_buffer_size = 256M
max_allowed_packet = 16M
max_connections = 1024
wait_timeout = 5
table_open_cache = 512
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 2M
myisam_sort_buffer_size = 128M
thread_cache_size = 8
query_cache_size= 256M
# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 8
ft_min_word_len = 2
read_rnd_buffer_size=2M
tmp_table_size=128M

我不確定如何優化my.cnf取決於./mysqltuner.pl返回結果。

我有一個有趣的驚喜給你。

您可以做的唯一優化全文索引不是在 my.cnf 級別。這與兩件事有關:

  1. 停用詞列表
  2. 查詢

停用詞

543 個停用詞您可能希望或可能不希望從 FULLTEXT 索引中過濾掉。停用詞列表是在編譯時建構的。您可以使用自己的列表覆蓋該列表,如下所示:

好的,現在讓我們創建我們的停用詞列表。我通常將英文文章設置為唯一的停用詞。

echo "a"    > /var/lib/mysql/stopwords.txt
echo "an"  >> /var/lib/mysql/stopwords.txt
echo "the" >> /var/lib/mysql/stopwords.txt

接下來,將選項添加到 /etc/my.cnf 加上允許 1 個字母、2 個字母和 3 個字母的單詞

[mysqld]
ft_min_word_len=1
ft_stopword_file=/var/lib/mysql/stopwords.txt

最後重啟mysql

service mysql restart

如果您有任何具有 FULLTEXT 索引的表,則必須刪除這些 FULLTEXT 索引並重新創建它們。

詢問

這是關於使用全表索引的 MySQL 查詢的一個鮮為人知的事實:有時 MySQL 查詢優化器完全停止使用 FULLTEXT 索引並執行全表掃描。

這是一個例子:

use test
drop table if exists ft_test;
create table ft_test
(
   id int not null auto_increment,
   txt text,
   primary key (id),
   FULLTEXT (txt)
) ENGINE=MyISAM;
insert into ft_test (txt) values
('mount camaroon'),('mount camaron'),('mount camnaroon'),
('mount cameroon'),('mount cemeroon'),('mount camnaroon'),
('mount camraon'),('mount camaraon'),('mount camaran'),
('mount camnaraon'),('mount cameroan'),('mount cemeroan'),
('mount camnaraon'),('munt camraon'),('munt camaraon'),
('munt camaran'),('munt camnaraon'),('munt cameroan'),
('munt cemeroan'),('munt camnaraon'),('mount camraan');
select * from ft_test WHERE  MATCH(txt) AGAINST ("+mount +cameroon" IN BOOLEAN MODE);

這是載入的範例數據:

mysql> use test
Database changed
mysql> drop table if exists ft_test;
Query OK, 0 rows affected (0.00 sec)

mysql> create table ft_test
   -> (
   ->     id int not null auto_increment,
   ->     txt text,
   ->     primary key (id),
   ->     FULLTEXT (txt)
   -> ) ENGINE=MyISAM;
Query OK, 0 rows affected (0.03 sec)

mysql> insert into ft_test (txt) values
   -> ('mount camaroon'),('mount camaron'),('mount camnaroon'),
   -> ('mount cameroon'),('mount cemeroon'),('mount camnaroon'),
   -> ('mount camraon'),('mount camaraon'),('mount camaran'),
   -> ('mount camnaraon'),('mount cameroan'),('mount cemeroan'),
   -> ('mount camnaraon'),('munt camraon'),('munt camaraon'),
   -> ('munt camaran'),('munt camnaraon'),('munt cameroan'),
   -> ('munt cemeroan'),('munt camnaraon'),('mount camraan');
Query OK, 21 rows affected (0.00 sec)
Records: 21  Duplicates: 0  Warnings: 0

mysql>

這是一個範例查詢及其解釋計劃

mysql> select * from ft_test WHERE  MATCH(txt) AGAINST ("cameroon" IN BOOLEAN MODE);
+----+----------------+
| id | txt            |
+----+----------------+
|  4 | mount cameroon |
+----+----------------+
1 row in set (0.00 sec)

mysql> explain select * from ft_test WHERE  MATCH(txt) AGAINST ("cameroon" IN BOOLEAN MODE)\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: ft_test
        type: fulltext
possible_keys: txt
         key: txt
     key_len: 0
         ref:
        rows: 1
       Extra: Using where
1 row in set (0.00 sec)

mysql>

好的,使用了 FULLTEXT 索引。

現在,讓我們稍微改變一下查詢

mysql> select * from ft_test WHERE  MATCH(txt) AGAINST ("cameroon" IN BOOLEAN MODE) = 1;
+----+----------------+
| id | txt            |
+----+----------------+
|  4 | mount cameroon |
+----+----------------+
1 row in set (0.00 sec)

mysql> explain select * from ft_test WHERE  MATCH(txt) AGAINST ("cameroon" IN BOOLEAN MODE) = 1\G
*************************** 1. row ***************************
          id: 1
 select_type: SIMPLE
       table: ft_test
        type: ALL
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 21
       Extra: Using where
1 row in set (0.00 sec)

mysql>

OMG FULLTEXT 索引發生了什麼?MySQL 查詢優化器基本上對它嗤之以鼻。如果您正在使用 ft_test 表執行 JOIN,則一旦發出全文搜尋的 WHERE 子句並且它會執行相同的操作,那麼誰知道查詢的其餘部分會發生什麼。

解決方案是重構查詢並嘗試隔離 FULLTEXT 搜尋並僅收集鍵。然後將這些鍵 LEFT JOIN 加入到原始表中。

例子

SELECT B.*
FROM (SELECT id from ft_test
WHERE MATCH(txt) AGAINST ("+cameroon" IN BOOLEAN MODE)) A
LEFT JOIN ft_test B USING (id);

對於這個查詢,這裡是結果和它的解釋

mysql> SELECT B.*
   -> FROM (SELECT id from ft_test
   -> WHERE MATCH(txt) AGAINST ("+cameroon" IN BOOLEAN MODE)) A
   -> LEFT JOIN ft_test B USING (id);
+----+----------------+
| id | txt            |
+----+----------------+
|  4 | mount cameroon |
+----+----------------+
1 row in set (0.00 sec)

mysql> explain SELECT B.*
   -> FROM (SELECT id from ft_test
   -> WHERE MATCH(txt) AGAINST ("+cameroon" IN BOOLEAN MODE)) A
   -> LEFT JOIN ft_test B USING (id)\G
*************************** 1. row ***************************
          id: 1
 select_type: PRIMARY
       table: <derived2>
        type: system
possible_keys: NULL
         key: NULL
     key_len: NULL
         ref: NULL
        rows: 1
       Extra:
*************************** 2. row ***************************
          id: 1
 select_type: PRIMARY
       table: B
        type: const
possible_keys: PRIMARY
         key: PRIMARY
     key_len: 4
         ref: const
        rows: 1
       Extra:
*************************** 3. row ***************************
          id: 2
 select_type: DERIVED
       table: ft_test
        type: fulltext
possible_keys: txt
         key: txt
     key_len: 0
         ref:
        rows: 1
       Extra: Using where
3 rows in set (0.00 sec)

mysql>

請注意,在 EXPLAIN 計劃的 DERIVED2 部分中,確實使用了 FULLTEXT 索引。

這個故事所講的道德

您必須養成決定數據庫將有多少停用詞、創建停用詞列表、配置它,然後創建/重新創建所有 FULLTEXT 索引的習慣。您還必須養成重構 FULLTEXT 搜尋查詢的習慣,以使 MySQL 查詢優化器不會生成錯誤的 EXPLAIN 計劃或使參與 EXPLAIN 計劃的其餘查詢的索引無效。

引用自:https://dba.stackexchange.com/questions/11716