将MySQL-mmm Master从REPLICATION_FAIL状态恢复

文章由LinuxBoy分享于2019-03-30 02:03:29热评（539）

将MySQL-mmm Master从REPLICATION_FAIL状态恢复

总是先要交待一下背景。

几天前网站突然不能访问了，页面上除了框架没有任何内容。从系统的运行日志看到的错误信息有：

Communications link failure
The last packet successfully received from the server was 7,875,055 milliseconds ago. The last packet sent successfully to the server was 7,875,055 milliseconds ago.

最后看到一句：

Caused by: java.sql.SQLException: The table 'message' is full

这个太不可思议了。在还没有当前用户量的情况不能出现数据库写满的情况。于是到数据库服务器Master1上查看，通过df -h命令查看，发现/var/已经满了。这是才记起来：当时数据库创建时，所有的数据文件都放在了另外一个目录下，然后/var/lib/mysql/下面是softlink。现在这种情况，肯定当时建过表后，没有移动到那个目录下。接下来步骤就是：

1. service mysql stop停止MySQL服务

2. 将数据表文件移动到指定目录，建立softlink

3. service mysql start启动MySQL服务

4. 到MySQL-mmm上通过mmm_control set_offline db01，然后mmm_control set_online db01，将master01重新上线。

之后通过mmm_control show 查看状态，已经是ONLINE了。

这样就结束了，NO! NO! 按照糗百（我在为糗百做广告，绝对没有）的惯例这不是GC。

今天在听一个报告的时候，突然想上去看看MySQL-mmm的运行状态。mmm_control show，不愿意看到的一幕出现了，db01的状态是REPLICATION_FAIL，set_offline， set_online，重新启动MySQL服务统统失效。

到db01上查看错误日志，看到了下面的信息：

111104 13:19:19 [ERROR] /usr/sbin/mysqld: Table 'table1' is marked as crashed and should be repaired
111104 13:19:19 [ERROR] Slave SQL: Error 'Table 'table1' is marked as crashed and should be repaired' on query. Default database: 'db1'. Query: '...'
111104 13:19:19 [Warning] Slave: Table './db1/table1' is marked as crashed and should be repaired Error_code: 145
111104 13:19:19 [Warning] Slave: Table 'table1' is marked as crashed and should be repaired Error_code: 1194
111104 13:19:19 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin-master2.000022' position 110544518

登录到数据库，执行:

mysql> repair table table1;
mysql> start slave;

再查看错误日志，可以看到：

111104 13:19:19 [Note] Slave I/O thread: connected to master 'replication@db02:3306',replication started in log 'mysql-bin-master2.000022' at position 679172934
111104 13:24:18 [Note] Found 11845 of 11846 rows when repairing './db1/table1'
111104 13:27:03 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin-master2.000022' at position 110544518, relay log '/mysql/log_vol/replication/mysql-bin-master1.004525' position: 844646

到MySQL-mmm监控服务器上查看状态，可以看到db01从REPLICATION_FAIL到REPLICATION_DELAY到ONLINE。等了一会儿，一直都是ONLINE状态，看来是稳定了。不过writer还是在db02。那么先把db02 set_offline，在把db02 set_online，可以看到writer切换到了db01。

有GC吗？呵呵，解决问题就好了 :-)

推荐文章：

将MySQL-mmm Master从REPLICATION_FAIL状态恢复