如何使用 MaxScale 限制不良應用程序對 MariaDB Galera 集群的影響？

October 28, 2021

我有一個具有三個 MariaDB 節點的 Galera Replication 集群，其中前面的 Maxscale Active-Passive 集群為 tis 客戶端提供單節點映像。
我有一個行為不端的客戶端，它打開連接而不關閉它們。在數據庫限制達到之前，連接數一直在增加。為了限制我在兩個參數下配置的連接數
max_connections=
max_user_connections=
我的情況是這樣的，當我只配置了 max_connections 時，只要達到限制，Galera 節點就會停止接受更多連接，並出現“連接太多”錯誤。當 Maxscale 看到此連接拒絕 n 次時，它會將伺服器置於維護模式。我可以理解這種行為，這是意料之中的。當我配置 max_user_connections 時，並且由於應用程序表現不佳並嘗試不斷建立新連接，當使用者特定限制達到進一步的連接嘗試時，後端的 mariadb 節點將失敗。Maxscale 觀察到這些故障，並再次將伺服器置於維護模式。我相信在這段時間內它只會看到來自壞客戶端的連接嘗試，沒有其他應用程序嘗試連接。
通過這種方式，MaxScale 會隨著時間的推移將所有三個節點都置於維護模式，這使得完整的數據庫服務不可用。
對於作為管理員的我來說，情況變得相同，設置使用者特定的限制並不能實現任何目標。這裡想問兩點
Q1。如何防止只有一個使用者連接失敗將後端 mariadb 節點投入維護？
Q2。關於 MaxScale 如何以及何時決定將伺服器置於維護模式的任何文件或教程、文章參考？
以下是有關環境的詳細資訊
Galera - 25.3.23，MariaDB - 10.3.12，MaxScale - 2.4.11，作業系統 - RHEL 7.4 (Maipo)
這是我的配置
MariaDB Galera 配置
[server]

# this is only for the mysqld standalone daemon
[mysqld]
#user statistics
userstat=1
performance_schema
#wait_timeout=600
max_allowed_packet=1024M
#
lower_case_table_names=1
#
max_connections=1500
max_user_connections=200
#
# * Galera-related settings
#
[galera]
# Mandatory settings
wsrep_on=ON
wsrep_provider=/usr/lib64/galera/libgalera_smm.so
wsrep_provider_options="gcache.size=300M; gcache.page_size=300M; pc.ignore_sb=false; pc.ignore_quorum=false"
#wsrep_cluster_address defines members of the cluster
wsrep_cluster_address=gcomm://x.x.x.1,x.x.x.2,x.x.x.3
wsrep_cluster_name="mariadb-cluster"
wsrep_node_address=x.x.x.1
wsrep_node_incoming_address=x.x.x.1
wsrep_debug=OFF
#
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
innodb_doublewrite=1
query_cache_size=0
innodb_flush_log_at_trx_commit=0
innodb_buffer_pool_size=5G
#
bind-address=x.x.x.1
#
[mariadb]
#performance
wait_timeout=31536000
#
#query logging
log_output=FILE
#slow queries
slow_query_log
slow_query_log_file=/var/log/mariadb/mariadb-slow.log
long_query_time=10.0
log_queries_not_using_indexes=ON
min_examined_row_limit=1000
log_slow_rate_limit=1
log_slow_verbosity=query_plan,explain
#
#error logs
log_error=/var/log/mariadb/mariadb-error.log
log_warnings=2
同樣配置所有三個 Galera 節點。
MaxScale 配置
[maxscale]
threads=auto

# Server definitions
[mariadb1]
type=server
address=x.x.x.1
port=3306
protocol=MariaDBBackend
#priority=0

[mariadb2]
type=server
address=x.x.x.2
port=3306
protocol=MariaDBBackend
#priority=1

[mariadb3]
type=server
address=x.x.x.3
port=3306
protocol=MariaDBBackend
#priority=1

# Monitor for the servers
#

[Galera-Monitor]
type=monitor
module=galeramon
servers=mariadb1, mariadb2, mariadb3
user=xxx
password=xxx
#disable_master_role_setting=true
monitor_interval=1000
#use_priority=true
#
disable_master_failback=true
available_when_donor=true

# Service definitions

[Galera-Service]
type=service
router=readwritesplit
master_accept_reads=true
connection_keepalive=300s
master_reconnection=true
master_failure_mode=error_on_write
connection_timeout=3600s
servers=mariadb1, mariadb2, mariadb3
user=xxx
password=xxx
#filters=Query-Log-Filter

#Listener

[Galera-Listener]
type=listener
service=Galera-Service
protocol=MariaDBClient
port=4306

我connection_timeout, max_connections, max_user_connections在數據庫伺服器節點上嘗試過配置，但沒有幫助。當錯誤的應用程序進行連接嘗試並且達到門檻值時，數據庫伺服器會斷開與"Too many connections". Maxscale 觀察了一段時間，並將後端伺服器置於Maintenance. 設置max_users_connections某個值說：200，導致後端伺服器在單個使用者達到限制時拒絕連接。現在，當Too many connections由於“max_users_connectionslimit threshold breach due to bad application, Maxscale again marks the server in維護”狀態而出現多次故障時。它不區分嘗試來自單個使用者或多個使用者的集體。它只看到來自伺服器的“連接過多”故障。
為了解決這種情況，我在 Maxscale 下為max_connections設置了限制的不良行為應用程序創建了一個單獨的服務。在服務的不同埠號上創建了一個單獨的偵聽器。
由於單獨的服務，只要max_connections達到 MAxscale 的門檻值，其他客戶端不受影響。此外，請注意max_connections後端 MariadB 伺服器的限制大於 Maxscale 上配置的值，因此 Maxscale 上的門檻值更早達到，並且它永遠不會將後端伺服器放入Maintenance mode. Maxscale 的新配置塊如下 -
[Galera-Service]
type=service
router=readwritesplit
master_accept_reads=true
connection_keepalive=300s
master_reconnection=true
master_failure_mode=error_on_write
connection_timeout=300s
max_connections=2500
servers=mariadb1, mariadb2, mariadb3
user=user
password=password


[Galera-Service-Bad-App]
type=service
router=readwritesplit
master_accept_reads=true
connection_keepalive=300s
master_reconnection=true
master_failure_mode=error_on_write
connection_timeout=300s
max_connections=250
servers=mariadb1, mariadb2, mariadb3
user=user
password=password
#

[Galera-Listener]
type=listener
service=Galera-Service
protocol=MariaDBClient
port=4306

[Galera-Listener-astro]
type=listener
service=Galera-Service-Badd-App
protocol=MariaDBClient
port=4307

我不認為 MaxScale 是您要用來解決此問題的組件。可以在 MariaDB 伺服器本身中執行此操作。我遇到了完全相同的問題，並通過使用該max_user_connections設置對數據庫使用者施加限制來解決它。
wait_timeout=31536000
為什麼這個值這麼大？您的應用程序是否保持連接打開而不是創建新連接？雖然這聽起來像是個好主意，但這意味著連接意外保持打開/空閒狀態直到很久以後才會關閉。
對於作為管理員的我來說，情況變得相同，設置使用者特定的限制並不能實現任何目標。
我不認為這是正確的。
Q1。如何防止只有一個使用者連接失敗將後端 mariadb 節點投入維護？
如果您限制數據庫使用者，使得max_user_connection所有使用者的總和 <max_connections每個節點的總和，那麼使用者將無法達到max_connections限制。
Q2。關於 MaxScale 如何以及何時決定將伺服器置於維護模式的任何文件或教程、文章參考？
我認為沒有一個單獨的文件，而是分散在 MaxScale 文件中。我認為維護模式最初是作為管理員安排計劃停機時間的一種方式，但後來也被用於其他事情，請參閱如maintenance_on_low_disk_space

引用自：https://dba.stackexchange.com/questions/286204

如何使用 MaxScale 限制不良應用程序對 MariaDB Galera 集群的影響？

相關問答

來自遠端mariadb10.3 galera 的mariabackup |’/var/lib/mysql/’ (Errcode: 2 ‘No such file or directory’)

MaxScale 以 root 或其他使用者身份登錄

如何重啟 MariaDB Galera 集群？

Mysql 和 Galera 集群 mariaDB ，私有和公共 IP 地址多站點

MariaDB 數據庫沒有“max_used_connections”

如何為 mariaDB 獲得更多連接或如何減少每個執行緒的 RAM 使用量？