赞
踩
http://www.vmcd.org/2012/04/mysql-high-availability-mha/
http://ylw6006.blog.51cto.com/470441/890360/
功能
Auto failover—定期ping master(执行select 1),连续3次失败则开始failover,将binlog gap导出并scp至latest slave,恢复后提升为master;
交互式—不监控master,出故障时人工调用MHA;
非交互式—不监控master仍能自动故障转移,使用第3方软件(heartbeat/pacemaker)检测master;
在线切换master—期间数据无法写入,持续0.5-2s,大致步骤:
检测cnf确定current master;确定new master;阻塞current master写;等待slave sync;提升new master,恢复写;
注:MHA每次都是通过检查cnf确认current master,如何找出当前master? Show slave status
Identifying the current master by connecting all hosts described in a config file. You do not have to specify which host is the current master. MHA automatically checks replication settings and identify the current master.
由此带来一个问题:应用架构如何自动识别master,以及负载均衡的分配比例
管理节点的角色类似于oracle数据库中的fast start failover中的observer,但mha上层可以通过keepalive部署VIP,程序连接数据库使用VIP,从而实现后台数据库的故障切换透明化
脚本
save_binary_logs:保存和复制当掉的主服务器二进制日志;
apply_diff_relay_logs:识别差异的relay log事件,并应用于其他salve服务器;
purge_relay_logs:清除relay log文件;
常见问题
1 manager进程down怎么办
不影响mysql,但不会再自动failover;
Current recommended MHA Manager HA solution is using common active/standby solutions like Pacemaker.
2
每次failover切换后会在管理目录生成文件app1.failover.complete ,下次在切换的时候会发现有这个文件导致切换不成功,需要手动清理掉。
rm -rf /masterha/app1/app1.failover.complete
也可以加上参数–ignore_last_failover
3增删节点
更新application config配置文件,重启manager进程
masterha_conf_host通过指定选项自动更新conf
[server_db101]
4
当有slave节点宕掉的情况是启动不了的,加上-ignore_fail_on_start即使有节点宕掉也能启动mha
nohup masterha_manager –conf=/etc/app1.cnf –ignore_fail_on_start < /dev/null
所有节点配备root ssh;
不支持3-tier,m1-m2-s3,不监控s3;conf配置m1/m2,所有主机设置multi_tier_slave=1
Slave必须开启log-bin;
所有服务器的过滤规则必须相同;
候选slave上必须有复制用户;
保留relay log(默认自动删除),purge_relay_logs脚本逻辑:
为relay log创建硬链接,relay_log_purge=1(自动清除relay log);等待sql thread读取新relay log;relay_log_purge=0(声明--disable_relay_log_purge);
设置crontab
[app@slave_host1]$ cat /etc/cron.d/purge_relay_logs
# purge relay logs at 5am
0 5 * * * app /usr/bin/purge_relay_logs --user=root --password=PASSWORD --disable_relay_log_purge >> /var/log/masterha/purge_relay_logs.log 2>&1
安装
1 配置root ssh等价性
2 安装perl DBD::mysql
3 安装MHA node – perl Makefile.PL; make; make install;
4 安装MHA manager - 同上;可单独机器,也可与MHA node同机;
5 配置文件,分为[server default]全局,以及[server]local
[server default]
manager_workdir=/masterha/app1 --工作目录
manager_log=/masterha/app1/manager.log --日志
user=root
password=oracle
ssh_user=root
repl_user=rep
repl_password=rep
ping_interval=1 --检查间隔
[server1]
hostname=db-181
master_binlog_dir=/data/mysql
candidate_master=1
[server2]
hostname=db-183
master_binlog_dir=/data/mysql
candidate_master=1
[server3]
hostname=db-184
master_binlog_dir=/data/mysql
candidate_master:=1为候选master,如指定多个则[server_1]优先级高于[server_2];默认latest slave被提升为new master;
no_master:=1该slave永远不会成为master
ignore_fail:=1即便该slave fails,MHA仍旧继续failover
master_binlog_dir:binlog生成目录,一旦master die用于读取剩余binlog
manager_workdir:MHA manager生成的各种状态文件,默认为/var/tmp
manager_log:MHA manager的日志目录
multi_tier_slave:不会abort 3-tier,仅忽略3rd timer;
ping_interval:MHA manager ping master的频率,连续错失3次则认为master挂了;默认3s;
ping_type:默认manager对master建立持续连接,每次运行select 1确认master是否down;0.53引入,SELECT选项则每次建立新连接;
secondary_check_script:默认manager只检查manager-master route,此脚本可检查manager – slave1 – master;
master_ip_failover_script:一般HA会为master多分一个VIP,一旦master崩溃,VIP漂移到new master;也可以维护一个catalog数据库,记录application同IP之间的映射,一旦failover需要更新;此参数允许用户自订制,实例位于/samples/scripts/master_ip_failover
有3个阶段会调用: 1 进入master monitoring 2 调用shutdown_script之前 3 new master应用relay log之后
Master_ip_online_change_script:用于online change;
Shutdown_script: 用于关闭master避免脑裂
MHA manager功能顺序
启动 nohup masterha_manager --conf=/etc/masterha/app1.cnf > /tmp/mha_manager.log &
failover
1 读取conf连接所有host判断current master, how(所有host的repl用户名/密码相同?no,[server1]可定制配置)
2 监控master,若连续3次失败则failover
3 重新读取conf
4 关闭master(可选),shutdown_script/ip_failover_script
5 选定new master,应用relay log后激活
6 恢复其余slave
7 通知(可选),report_script
Online switch
1 确定current master
2 选举new master
2 阻塞current master写操作,flush tables with read lock
3 等待slave sync, master_log_pos()
4 将new master置为可写
5 对剩余slave执行change master/start slave
https://code.google.com/p/mysql-master-ha/wiki/TableOfContents?tm=6
internal
1 如何获取relay log position
MHA直接解析binlog header,而mysqlbinlog会连同binlog body一起输出;
Header存储event type/length/end pos,可mysqlbinlog –base64-output=always再次确认
2 MHA并非从头开始顺序解析binlog,可直接跳至end pos
$latest_mlf = Master_Log_File on the latest slave
$target_mlf = Master_Log_File on the recovery target slave
$latest_rmlp = Read_Master_Log_Pos on the latest slave
$target_rmlp = Read_Master_Log_Pos on the recovery target slave
$offset = $latest_rmlp - $target_rmlp
$filesize = File size of the latest relay log file on the latest slave
if ($latest_mlf eq $target_mlf) && ($filesize > $offset), and if binlog events can be readable from ($filesize - $offset) position, MHA decides that starting recovery position is from ($filesize - $offset) position from the latest relay log file on the latest slave.
3 整合binlog gap
每个slave恢复有3个过程:应用exec_master_log_pos到read_master_log_pos之间的gap;应用read pos到latest slave’s read pos之间的gap;应用latest slave到master之间binlog的gap;
A target slave的exec pos到read pos
B target slave的read pos到latest slave的read pos(剪除开头的格式化事件)
C latest slave的read pos到master的tail binlog
D 将上述输出整合到同一个文件,执行mysqlbinlog应用到target slave
A-C全是binlog的原始信息(而非mysqlbinlog output),此举是为了支持row-based,同一个row event可能跨binlog存储(对单个文件执行mysqlbinlog会得到不完整信息)
Mysqlbinlog默认输出结尾会添加rollback,整合成一个文件避免了跨binlog事务中间被回滚的风险,因为格式化事件可能产生ROLLBACK所以B-C去除;
案例
1
[root@dg55 ~]# tail -f /masterha/app1/manager.log
Wed Jun 6 14:50:48 2012 - [info]
192.168.123.13 (current master)
+--192.168.123.14
Wed Jun 6 14:50:48 2012 - [warning] master_ip_failover_script is not defined.
Wed Jun 6 14:50:48 2012 - [warning] shutdown_script is not defined.
Wed Jun 6 14:50:48 2012 - [info] Set master ping interval 1 seconds.
Wed Jun 6 14:50:48 2012 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Wed Jun 6 14:50:48 2012 - [info] Starting ping health check on 192.168.123.13(192.168.123.13:3306)..—定期检测master mysql,连续3次失败则读取conf进行重组
Wed Jun 6 14:50:48 2012 - [info] Ping succeeded, sleeping until it doesn't respond..
Wed Jun 6 14:51:32 2012 - [warning] Got error on MySQL ping: 2006 (MySQL server has gone away)
Wed Jun 6 14:51:32 2012 - [info] HealthCheck: SSH to 192.168.123.13 is reachable.
Wed Jun 6 14:51:33 2012 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Wed Jun 6 14:51:33 2012 - [warning] Connection failed 1 time(s)..
Wed Jun 6 14:51:34 2012 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Wed Jun 6 14:51:34 2012 - [warning] Connection failed 2 time(s)..
Wed Jun 6 14:51:35 2012 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at 'reading initial communication packet', system error: 111)
Wed Jun 6 14:51:35 2012 - [warning] Connection failed 3 time(s)..
Wed Jun 6 14:51:35 2012 - [warning] Master is not reachable from health checker!
Wed Jun 6 14:51:35 2012 - [warning] Master 192.168.123.13(192.168.123.13:3306) is not reachable!
Wed Jun 6 14:51:35 2012 - [warning] SSH is reachable.
Wed Jun 6 14:51:35 2012 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
Wed Jun 6 14:51:35 2012 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jun 6 14:51:35 2012 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Wed Jun 6 14:51:35 2012 - [info] Reading server configurations from /etc/masterha/app1.cnf..
Wed Jun 6 14:51:35 2012 - [info] Dead Servers:
Wed Jun 6 14:51:35 2012 - [info] 192.168.123.13(192.168.123.13:3306)
Wed Jun 6 14:51:35 2012 - [info] Alive Servers:
Wed Jun 6 14:51:35 2012 - [info] 192.168.123.14(192.168.123.14:3306)
Wed Jun 6 14:51:35 2012 - [info] Alive Slaves:
Wed Jun 6 14:51:35 2012 - [info] 192.168.123.14(192.168.123.14:3306) Version=5.5.25-log (oldest major version between slaves) log-bin:enabled
Wed Jun 6 14:51:35 2012 - [info] Replicating from 192.168.123.13(192.168.123.13:3306)
Wed Jun 6 14:51:35 2012 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jun 6 14:51:35 2012 - [info] Checking slave configurations..
Wed Jun 6 14:51:35 2012 - [warning] read_only=1 is not set on slave 192.168.123.14(192.168.123.14:3306).
Wed Jun 6 14:51:35 2012 - [warning] relay_log_purge=0 is not set on slave 192.168.123.14(192.168.123.14:3306).
Wed Jun 6 14:51:35 2012 - [info] Checking replication filtering settings..
Wed Jun 6 14:51:35 2012 - [info] Replication filtering check ok.
Wed Jun 6 14:51:35 2012 - [info] Master is down!
Wed Jun 6 14:51:35 2012 - [info] Terminating monitoring script.
Wed Jun 6 14:51:35 2012 - [info] Got exit code 20 (Master dead).
Wed Jun 6 14:51:35 2012 - [info] MHA::MasterFailover version 0.52.
Wed Jun 6 14:51:35 2012 - [info] Starting master failover.
Wed Jun 6 14:51:35 2012 - [info]
Wed Jun 6 14:51:35 2012 - [info] * Phase 1: Configuration Check Phase..—重构分为1检查配置 2关闭dead master 3 获取最新slave
Wed Jun 6 14:51:35 2012 - [info]
Wed Jun 6 14:51:35 2012 - [info] Dead Servers:
Wed Jun 6 14:51:35 2012 - [info] 192.168.123.13(192.168.123.13:3306)
Wed Jun 6 14:51:35 2012 - [info] Checking master reachability via mysql(double check)..
Wed Jun 6 14:51:35 2012 - [info] ok.
Wed Jun 6 14:51:35 2012 - [info] Alive Servers:
Wed Jun 6 14:51:35 2012 - [info] 192.168.123.14(192.168.123.14:3306)
Wed Jun 6 14:51:35 2012 - [info] Alive Slaves:
Wed Jun 6 14:51:35 2012 - [info] 192.168.123.14(192.168.123.14:3306) Version=5.5.25-log (oldest major version between slaves) log-bin:enabled
Wed Jun 6 14:51:35 2012 - [info] Replicating from 192.168.123.13(192.168.123.13:3306)
Wed Jun 6 14:51:35 2012 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jun 6 14:51:35 2012 - [info] ** Phase 1: Configuration Check Phase completed.
Wed Jun 6 14:51:35 2012 - [info]
Wed Jun 6 14:51:35 2012 - [info] * Phase 2: Dead Master Shutdown Phase..
Wed Jun 6 14:51:35 2012 - [info]
Wed Jun 6 14:51:35 2012 - [info] Forcing shutdown so that applications never connect to the current master..
Wed Jun 6 14:51:35 2012 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master ip address.
Wed Jun 6 14:51:35 2012 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Wed Jun 6 14:51:35 2012 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Wed Jun 6 14:51:35 2012 - [info]
Wed Jun 6 14:51:35 2012 - [info] * Phase 3: Master Recovery Phase..
Wed Jun 6 14:51:35 2012 - [info]
Wed Jun 6 14:51:35 2012 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Wed Jun 6 14:51:35 2012 - [info]
Wed Jun 6 14:51:35 2012 - [info] The latest binary log file/position on all slaves is mysql-bin.000021:107
Wed Jun 6 14:51:35 2012 - [info] Latest slaves (Slaves that received relay log files to the latest):
Wed Jun 6 14:51:35 2012 - [info] 192.168.123.14(192.168.123.14:3306) Version=5.5.25-log (oldest major version between slaves) log-bin:enabled
Wed Jun 6 14:51:35 2012 - [info] Replicating from 192.168.123.13(192.168.123.13:3306)
Wed Jun 6 14:51:35 2012 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jun 6 14:51:35 2012 - [info] The oldest binary log file/position on all slaves is mysql-bin.000021:107
Wed Jun 6 14:51:35 2012 - [info] Oldest slaves:
Wed Jun 6 14:51:35 2012 - [info] 192.168.123.14(192.168.123.14:3306) Version=5.5.25-log (oldest major version between slaves) log-bin:enabled
Wed Jun 6 14:51:35 2012 - [info] Replicating from 192.168.123.13(192.168.123.13:3306)
Wed Jun 6 14:51:35 2012 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jun 6 14:51:35 2012 - [info]
Wed Jun 6 14:51:35 2012 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase.. save_binary_logs脚本将尚未传送的slave的binlog部分转义出来,并scp传送给slave
Wed Jun 6 14:51:35 2012 - [info]
Wed Jun 6 14:51:35 2012 - [info] Fetching dead master's binary logs..
Wed Jun 6 14:51:35 2012 - [info] Executing command on the dead master 192.168.123.13(192.168.123.13:3306): save_binary_logs --command=save --start_file=mysql-bin.000021 --start_pos=107 --binlog_dir=/mydata --output_file=/var/tmp/saved_master_binlog_from_192.168.123.13_3306_20120606145135.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.52
Creating /var/tmp if not exists.. ok.
Concat binary/relay logs from mysql-bin.000021 pos 107 to mysql-bin.000021 EOF into /var/tmp/saved_master_binlog_from_192.168.123.13_3306_20120606145135.binlog ..
Dumping binlog format description event, from position 0 to 107.. ok.
Dumping effective binlog data from /mydata/mysql-bin.000021 position 107 to tail(126).. ok.
Concat succeeded.
Wed Jun 6 14:51:36 2012 - [info] scp from root@192.168.123.13:/var/tmp/saved_master_binlog_from_192.168.123.13_3306_20120606145135.binlog to local:/masterha/app1/saved_master_binlog_from_192.168.123.13_3306_20120606145135.binlog succeeded.
Wed Jun 6 14:51:36 2012 - [info] HealthCheck: SSH to 192.168.123.14 is reachable.
Wed Jun 6 14:51:37 2012 - [info]
Wed Jun 6 14:51:37 2012 - [info] * Phase 3.3: Determining New Master Phase..
Wed Jun 6 14:51:37 2012 - [info]
Wed Jun 6 14:51:37 2012 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Wed Jun 6 14:51:37 2012 - [info] All slaves received relay logs to the same position. No need to resync each other.
Wed Jun 6 14:51:37 2012 - [info] Searching new master from slaves..
Wed Jun 6 14:51:37 2012 - [info] Candidate masters from the configuration file:
Wed Jun 6 14:51:37 2012 - [info] 192.168.123.14(192.168.123.14:3306) Version=5.5.25-log (oldest major version between slaves) log-bin:enabled
Wed Jun 6 14:51:37 2012 - [info] Replicating from 192.168.123.13(192.168.123.13:3306)
Wed Jun 6 14:51:37 2012 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jun 6 14:51:37 2012 - [info] Non-candidate masters:
Wed Jun 6 14:51:37 2012 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Wed Jun 6 14:51:37 2012 - [info] New master is 192.168.123.14(192.168.123.14:3306)
Wed Jun 6 14:51:37 2012 - [info] Starting master failover..
Wed Jun 6 14:51:37 2012 - [info]
From:
192.168.123.13 (current master)
+--192.168.123.14
To:
192.168.123.14 (new master)
Wed Jun 6 14:51:37 2012 - [info]
Wed Jun 6 14:51:37 2012 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Wed Jun 6 14:51:37 2012 - [info]
Wed Jun 6 14:51:37 2012 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Wed Jun 6 14:51:37 2012 - [info] Sending binlog..
Wed Jun 6 14:51:37 2012 - [info] scp from local:/masterha/app1/saved_master_binlog_from_192.168.123.13_3306_20120606145135.binlog to root@192.168.123.14:/var/tmp/saved_master_binlog_from_192.168.123.13_3306_20120606145135.binlog succeeded.
Wed Jun 6 14:51:37 2012 - [info]
Wed Jun 6 14:51:37 2012 - [info] * Phase 3.4: Master Log Apply Phase..应用binlog gap即上步转义出的binlog
Wed Jun 6 14:51:37 2012 - [info]
Wed Jun 6 14:51:37 2012 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Wed Jun 6 14:51:37 2012 - [info] Starting recovery on 192.168.123.14(192.168.123.14:3306)..
Wed Jun 6 14:51:37 2012 - [info] Generating diffs succeeded.
Wed Jun 6 14:51:37 2012 - [info] Waiting until all relay logs are applied.
Wed Jun 6 14:51:37 2012 - [info] done.
Wed Jun 6 14:51:37 2012 - [info] Getting slave status..
Wed Jun 6 14:51:37 2012 - [info] This slave(192.168.123.14)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000021:107). No need to recover from Exec_Master_Log_Pos.
Wed Jun 6 14:51:37 2012 - [info] Connecting to the target slave host 192.168.123.14, running recover script..
Wed Jun 6 14:51:37 2012 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user=root --slave_host=192.168.123.14 --slave_ip=192.168.123.14 --slave_port=3306 --apply_files=/var/tmp/saved_master_binlog_from_192.168.123.13_3306_20120606145135.binlog --workdir=/var/tmp --target_version=5.5.25-log --timestamp=20120606145135 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.52 --slave_pass=xxx
Wed Jun 6 14:51:37 2012 - [info]
Applying differential binary/relay log files /var/tmp/saved_master_binlog_from_192.168.123.13_3306_20120606145135.binlog on 192.168.123.14:3306. This may take long time...
Applying log files succeeded.
Wed Jun 6 14:51:37 2012 - [info] All relay logs were successfully applied.
Wed Jun 6 14:51:37 2012 - [info] Getting new master's binlog name and position..
Wed Jun 6 14:51:37 2012 - [info] mysql-bin.000023:107
Wed Jun 6 14:51:37 2012 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.123.14', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000023', MASTER_LOG_POS=107, MASTER_USER='r_test', MASTER_PASSWORD='xxx';
Wed Jun 6 14:51:37 2012 - [warning] master_ip_failover_script is not set. Skipping taking over new master ip address.
Wed Jun 6 14:51:37 2012 - [info] ** Finished master recovery successfully.
Wed Jun 6 14:51:37 2012 - [info] * Phase 3: Master Recovery Phase completed.
Wed Jun 6 14:51:37 2012 - [info]
Wed Jun 6 14:51:37 2012 - [info] * Phase 4: Slaves Recovery Phase.. 利用最新的slave同步剩余的slave
Wed Jun 6 14:51:37 2012 - [info]
Wed Jun 6 14:51:37 2012 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Wed Jun 6 14:51:37 2012 - [info]
Wed Jun 6 14:51:37 2012 - [info] Generating relay diff files from the latest slave succeeded.
Wed Jun 6 14:51:37 2012 - [info]
Wed Jun 6 14:51:37 2012 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Wed Jun 6 14:51:37 2012 - [info]
Wed Jun 6 14:51:37 2012 - [info] All new slave servers recovered successfully.
Wed Jun 6 14:51:37 2012 - [info]
Wed Jun 6 14:51:37 2012 - [info] * Phase 5: New master cleanup phease..提升最新的slave为master
Wed Jun 6 14:51:37 2012 - [info]
Wed Jun 6 14:51:37 2012 - [info] Resetting slave info on the new master..
Wed Jun 6 14:51:37 2012 - [info] Master failover to 192.168.123.14(192.168.123.14:3306) completed successfully.
Wed Jun 6 14:51:37 2012 - [info]
----- Failover Report -----
app1: MySQL Master failover 192.168.123.13 to 192.168.123.14 succeeded
Master 192.168.123.13 is down!
Check MHA Manager logs at dg55.yang.com:/masterha/app1/manager.log for details.
Started automated(non-interactive) failover.
The latest slave 192.168.123.14(192.168.123.14:3306) has all relay logs for recovery.
Selected 192.168.123.14 as a new master.
192.168.123.14: OK: Applying all logs succeeded.
Generating relay diff files from the latest slave succeeded.
192.168.123.14: Resetting slave info succeeded.
Master failover to 192.168.123.14(192.168.123.14:3306) completed successfully.
Pacemaker + DRBD
1 为active/passive架构,passive不承担任何任务,成本较高;
2 downtime较长,passive的buffer pool为空warm-up从0开始
3 为确保数据一致,innodb_flush_log_at_trx_commit/sync_binlog=1
4 复杂,
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/15480802/viewspace-1189043/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/15480802/viewspace-1189043/
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。