MySQL主從復制 - 常見故障與處理辦法
[TOC]
測試環境版本:
[root@master data]# mysql -V
mysql Ver 8.0.30 for Linux on x86_64 (MySQL Community Server - GPL)
一、主從同步報錯
? 這種情況常發生于從庫沒有配置super_read_only=1,然后錯誤地在從庫修改了數據,導致從庫與主庫數據不一致。
要解決這類問題,通常需要在從庫執行反向操作,比如刪掉這些錯誤增改的數據,讓主從數據恢復到之前一致狀態。當然也可以先忽略錯誤,繼續同步后面的數據。
1.1 常見錯誤:1062主鍵沖突
1.1.1 模擬錯誤
- 主庫建表,插入數據
mysql> use testdb1;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> create table t1 (id int primary key,val varchar(20));
Query OK, 0 rows affected (0.02 sec)
mysql> insert into t1 values(1,'hi');
Query OK, 1 row affected (0.00 sec)
- salve檢查同步正常
mysql> use tesdb1;
ERROR 1049 (42000): Unknown database 'tesdb1'
mysql> use testdb1;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql>
mysql> select * from t1;
+----+------+
| id | val |
+----+------+
| 1 | hi |
+----+------+
1 row in set (0.00 sec)
- slave 手動添加一條數據
mysql> insert into t1 values(2,'hello');
Query OK, 1 row affected (0.00 sec)
- 主庫繼續添加數據
mysql> insert into t1 values(2,'ni hao');
Query OK, 1 row affected (0.01 sec)
- slave檢查同步報錯
mysql> show slave status \G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 172.17.0.2
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000002
Read_Master_Log_Pos: 1430
Relay_Log_File: slave-relay-bin.000002
Relay_Log_Pos: 1235
Relay_Master_Log_File: master-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB: information_schema,performance_schema,mysql,sys
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1062
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '2a94bc88-5018-11ed-9277-0242ac110002:5' at master log master-bin.000002, end_log_pos 1399. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Skip_Counter: 0
Exec_Master_Log_Pos: 1136
Relay_Log_Space: 1739
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1062
Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '2a94bc88-5018-11ed-9277-0242ac110002:5' at master log master-bin.000002, end_log_pos 1399. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Replicate_Ignore_Server_Ids:
Master_Server_Id: 80303306
Master_UUID: 2a94bc88-5018-11ed-9277-0242ac110002
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 221020 10:15:17
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 2a94bc88-5018-11ed-9277-0242ac110002:2-5
Executed_Gtid_Set: 2a94bc88-5018-11ed-9277-0242ac110002:1-4,
2c1a941e-5018-11ed-90e0-0242ac110003:1-2
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
1 row in set, 1 warning (0.01 sec)
ERROR:
No query specified
?結論:可以看到拋出如下錯誤:
Slave_SQL_Running: No Last_SQL_Errno: 1062 Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '2a94bc88-5018-11ed-9277-0242ac110002:5' at master log master-bin.000002, end_log_pos 1399. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
1.1.2 解決辦法:Slave端反向處理錯誤數據?
- 根據Last_Error中的報錯信息獲取具體出錯的SQL
方法1:通過主庫binlog查詢
-- 從上面報錯可以看到,錯誤的事務是 master log master-bin.000002, end_log_pos 1399
[root@master data]# mysqlbinlog -v --base64-output=DECODE-ROWS master-bin.000002|grep 'end_log_pos 1399' -C 6
SET TIMESTAMP=1666232117/*!*/;
BEGIN
/*!*/;
# at 1293
#221020 10:15:17 server id 80303306 end_log_pos 1352 CRC32 0x5a1ace2e Table_map: `testdb1`.`t1` mapped to number 166
# at 1352
#221020 10:15:17 server id 80303306 end_log_pos 1399 CRC32 0xd290df5a Write_rows: table id 166 flags: STMT_END_F
### INSERT INTO `testdb1`.`t1`
### SET
### @1=2
### @2='ni hao'
# at 1399
#221020 10:15:17 server id 80303306 end_log_pos 1430 CRC32 0x4b78fb88 Xid = 119
結論: 可以看到end_log_pos 1399 對應的事務就是?INSERT INTO testdb1.t1??SET?@1=2?@2=‘ni hao’,這一行數據在slave端無法插入,因為主鍵沖突,所以刪除slave端主鍵沖突的這一行數據即可。
方法2 :slave端通過表查詢
Last_SQL_Errno: 1062
mysql> select * from performance_schema.replication_applier_status_by_worker where LAST_ERROR_NUMBER=1062\G;
*************************** 1. row ***************************
CHANNEL_NAME:
WORKER_ID: 1
THREAD_ID: NULL
SERVICE_STATE: OFF
LAST_ERROR_NUMBER: 1062
LAST_ERROR_MESSAGE: Worker 1 failed executing transaction '2a94bc88-5018-11ed-9277-0242ac110002:5' at master log master-bin.000002, end_log_pos 1399; Could not execute Write_rows event on table testdb1.t1; Duplicate entry '2' for key 't1.PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log master-bin.000002, end_log_pos 1399
LAST_ERROR_TIMESTAMP: 2022-10-20 10:15:17.969478
LAST_APPLIED_TRANSACTION: 2a94bc88-5018-11ed-9277-0242ac110002:4
LAST_APPLIED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2022-10-20 10:13:41.920825
LAST_APPLIED_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2022-10-20 10:13:41.920825
LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP: 2022-10-20 10:13:41.921418
LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP: 2022-10-20 10:13:41.936177
APPLYING_TRANSACTION: 2a94bc88-5018-11ed-9277-0242ac110002:5
APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP: 2022-10-20 10:15:17.968554
APPLYING_TRANSACTION_IMMEDIATE_COMMIT_TIMESTAMP: 2022-10-20 10:15:17.968554
APPLYING_TRANSACTION_START_APPLY_TIMESTAMP: 2022-10-20 10:15:17.969275
LAST_APPLIED_TRANSACTION_RETRIES_COUNT: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
LAST_APPLIED_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
APPLYING_TRANSACTION_RETRIES_COUNT: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_NUMBER: 0
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_MESSAGE:
APPLYING_TRANSACTION_LAST_TRANSIENT_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
1 row in set (0.00 sec)
ERROR:
No query specified
結論: 可以看到 Could not execute Write_rows event on table testdb1.t1; Duplicate entry ‘2’ for key ‘t1.PRIMARY’,,因為主鍵沖突,且沖突的是主鍵=2的這一行,所以刪除slave端主鍵沖突的這一行數據即可。
- 定位出錯誤語句后,在從庫執行反向操作,我們這里刪除主鍵沖突的這一行,然后重新啟動slave進程。
mysql> delete from t1 where id=2;
Query OK, 1 row affected (0.00 sec)
mysql> start slave;
Query OK, 0 rows affected, 1 warning (0.01 sec)
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 172.17.0.2
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000002
Read_Master_Log_Pos: 1430
Relay_Log_File: slave-relay-bin.000002
Relay_Log_Pos: 1529
Relay_Master_Log_File: master-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: information_schema,performance_schema,mysql,sys
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 1430
Relay_Log_Space: 1739
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 80303306
Master_UUID: 2a94bc88-5018-11ed-9277-0242ac110002
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 2a94bc88-5018-11ed-9277-0242ac110002:2-5
Executed_Gtid_Set: 2a94bc88-5018-11ed-9277-0242ac110002:1-5,
2c1a941e-5018-11ed-90e0-0242ac110003:1-3
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
1 row in set, 1 warning (0.01 sec)
ERROR:
No query specified
mysql> select * from t1;
+----+--------+
| id | val |
+----+--------+
| 1 | hi |
| 2 | ni hao |
+----+--------+
2 rows in set (0.00 sec)
結論:主從同步恢復正常。
1.2 常見錯誤:1032更改的數據不存在
1.2.1 模擬錯誤
- slave 手動刪除一條數據
mysql> delete from t1 where id=2;
Query OK, 1 row affected (0.00 sec)
- 主庫更新數據
mysql> update t1 set val='hello' where id=2;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
- salve 報錯
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 172.17.0.2
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000002
Read_Master_Log_Pos: 1745
Relay_Log_File: slave-relay-bin.000002
Relay_Log_Pos: 1529
Relay_Master_Log_File: master-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB: information_schema,performance_schema,mysql,sys
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1032
Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '2a94bc88-5018-11ed-9277-0242ac110002:6' at master log master-bin.000002, end_log_pos 1714. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Skip_Counter: 0
Exec_Master_Log_Pos: 1430
Relay_Log_Space: 2054
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1032
Last_SQL_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '2a94bc88-5018-11ed-9277-0242ac110002:6' at master log master-bin.000002, end_log_pos 1714. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
Replicate_Ignore_Server_Ids:
Master_Server_Id: 80303306
Master_UUID: 2a94bc88-5018-11ed-9277-0242ac110002
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 221020 12:57:36
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 2a94bc88-5018-11ed-9277-0242ac110002:2-6
Executed_Gtid_Set: 2a94bc88-5018-11ed-9277-0242ac110002:1-5,
2c1a941e-5018-11ed-90e0-0242ac110003:1-4
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
1 row in set, 1 warning (0.00 sec)
ERROR:
No query specified
結論:可以看到拋出如下錯誤:
Slave_SQL_Running: No Last_Errno: 1032 Last_Error: Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction '2a94bc88-5018-11ed-9277-0242ac110002:6' at master log master-bin.000002, end_log_pos 1714. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.
1.2.2 解決辦法:跳過錯誤的事務?
從slave status中可以看到執行報錯的事務是failed executing transaction ‘2a94bc88-5018-11ed-9277-0242ac110002:6’,所以需要跳過該事務。
-- GTID模式下跳過事務
mysql> stop slave;
Query OK, 0 rows affected, 1 warning (0.01 sec)
mysql> set gtid_next='2a94bc88-5018-11ed-9277-0242ac110002:6';
Query OK, 0 rows affected (0.00 sec)
mysql> begin;commit;
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
mysql> set gtid_next='AUTOMATIC';
Query OK, 0 rows affected (0.00 sec)
mysql> start slave;
Query OK, 0 rows affected, 1 warning (0.01 sec)
mysql> show slave status \G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 172.17.0.2
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000002
Read_Master_Log_Pos: 1745
Relay_Log_File: slave-relay-bin.000003
Relay_Log_Pos: 463
Relay_Master_Log_File: master-bin.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: information_schema,performance_schema,mysql,sys
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 1745
Relay_Log_Space: 2360
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 80303306
Master_UUID: 2a94bc88-5018-11ed-9277-0242ac110002
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 2a94bc88-5018-11ed-9277-0242ac110002:2-6
Executed_Gtid_Set: 2a94bc88-5018-11ed-9277-0242ac110002:1-6,
2c1a941e-5018-11ed-90e0-0242ac110003:1-4
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
1 row in set, 1 warning (0.01 sec)
ERROR:
No query specified
結論:跳過錯誤的事務后,恢復同步狀態。?
-- 主庫繼續插入數據
mysql> insert into t1 values(3,'good luck');
Query OK, 1 row affected (0.00 sec)
-- slave端同步正常
mysql> select * from t1;
+----+-----------+
| id | val |
+----+-----------+
| 1 | hi |
| 3 | good luck |
+----+-----------+
2 rows in set (0.00 sec)
1.3 其他主從報錯解決辦法?
a. 設置參數 sql_slave_skip_counter 跳過當前錯誤
set global sql_slave_skip_counter=1; -- 1跳過當前出錯事務
slave_skip_errors=all; --all跳過所有出錯事務
注意: 在GTID模式下,不支持該方式。?
-- 5.7版本執行會拋出如下錯誤:
mysql> set global sql_slave_skip_counter=1;
ERROR 1858 (HY000): sql_slave_skip_counter can not be set when the server is running with @@GLOBAL.GTID_MODE = ON. Instead, for each transaction that you want to skip, generate an empty transaction with the same GTID as the transaction
-- 8.0版本,不報錯,但是有2 warnings,也不能解決問題。
mysql> set global sql_slave_skip_counter=1;
Query OK, 0 rows affected, 2 warnings (0.02 sec)
b. 設置參數slave_skip_errors 跳過指定類型的錯誤
使用GTID復制從庫跳過錯誤,不支持設置sql_slave_skip_counter,但是支持設置參數slave_skip_errors(read only variable),且只能在參數文件中設置。
slave_skip_errors選項有四個可用值,分別為: off,all,ErorCode,ddl_exist_errors 。
my.cnf中的寫法有:
slave_skip_errors=1062,1053
slave_skip_errors=all -- 跳過所有錯誤,不建議使用
slave_skip_errors=ddl_exist_errors
?該參數不支持在線修改:
mysql> set global slave_skip_errors=1062;
ERROR 1238 (HY000): Variable 'slave_skip_errors' is a read only variable
c. 使用pt-slave-restart工具
示例1:跳過1個錯誤 ,ip 為從庫ip地址
[root@mysql]# pt-slave-restart --user=root --password=xxx --host=172.17.0.3 --port=3307 --skip-count=1
示例2:跳過錯誤代碼為1062的錯誤 ,ip 為從庫ip地址
[root@mysql]# pt-slave-restart --user=root --password=xxx --host=172.17.0.3 --port=3307 --error-numbers=1062
二、主從數據不一致
? 導致主庫數據不一致的根本原因就是直接在從庫上進行增刪改操作,比如上面測試中,我們通過跳過事務恢復同步,導致主庫和備庫數據不一致。
--主庫:
mysql> select * from testdb1.t1;
+----+-----------+
| id | val |
+----+-----------+
| 1 | hi |
| 2 | hello |
| 3 | good luck |
+----+-----------+
3 rows in set (0.01 sec)
--slave:
mysql> select * from testdb1.t1;
+----+-----------+
| id | val |
+----+-----------+
| 1 | hi |
| 3 | good luck |
+----+-----------+
2 rows in set (0.00 sec)
解決辦法?
1. 通過mysqldump重新初始化表
略。。。
2. 使用pt-table-sync主從數據修復
pt-table-sync是Percona-Toolkit工具集中的一個組件,主要用于表數據的高效同步。
使用對兩個庫不一致的數據進行同步,他能夠自動發現兩個實例間不一致的數據,然后進行sync操作,但是pt-table-sync無法同步表結構和索引等對象,只能同步數據。
NOTE1: 如果是sync主從數據,只有當需要sync的表都有唯一鍵(主鍵或唯一索引),才能使用–sync-to-master and/or --replicate。當使用–replicate或者–sync-to-master參數來對從庫數據進行同步時,如果主從數據不一致,不會直接在從庫上修改不一致的數據,而是在主庫上進行replace 操作(需要表有唯一鍵),通過binlog把動作傳遞給從庫,從而達到修改從庫數據的目的。當然在主庫上做的修改應該是no-op的,并不會修改主庫表的當前值,只對從庫數據產生影響。
NOTE2: 沒有唯一鍵,那么只能通過直接修改從庫上的數據來實現主從數據的同步。(不能使用–replicate或者–sync-to-master)
沒有主鍵的情況下,幾種修復方式:
a) 同步所有的庫和表
[root@mysql57 ~]# pt-table-sync --charset=utf8 --ignore-databases=mysql,sys,percona dsn=u=root,p=xxx,h=172.17.0.2,P=3306 dsn=u=root,p=xxx,h=172.17.0.3,P=3307 --execute --print --no-check-slave
b) 可以指定多個從庫,命令里有多個ip,第一次出現的是Master的地址,第2個是Slave的地址,第3個也是slave地址…
[root@mysql57 ~]# pt-table-sync --charset=utf8 --ignore-databases=mysql,sys,percona dsn=u=root,p=xxx,h=172.17.0.2,P=3306 dsn=u=root,p=xxx,h=172.17.0.3,P=3307 dsn=u=root,p=xxx,h=172.172.0.4,P=3308 --execute –print --no-check-slave
c) 只對指定的庫進行sync
[root@mysql57 ~]# pt-table-sync --charset=utf8 --ignore-databases=mysql,sys,percona --databases=test1 --no-check-slave dsn=u=root,p=xxx,h=172.17.0.2,P=3306 dsn=u=root,p=xxx,h=172.17.0.3,P=3307 --execute --print --no-check-slave
d) 只對指定的表進行sync ?修復我們當前環境不同步的t1表?
[root@slave ~]# pt-table-sync --charset=utf8mb4 --ignore-databases=mysql,sys,percona --databases=testdb1 --tables=t1 --no-check-slave dsn=u=root,p=lhr,h=172.17.0.2,P=3306 dsn=u=root,p=lhr,h=172.17.0.3,P=3307 --execute --print --no-check-slave
*******************************************************************
# A software update is available:
INSERT INTO `testdb1`.`t1`(`id`, `val`) VALUES ('2', 'hello') /*percona-toolkit src_db:testdb1 src_tbl:t1 src_dsn:A=utf8mb4,P=3306,h=172.17.0.2,p=... dst_db:testdb1 dst_tbl:t1 dst_dsn:A=utf8mb4,P=3307,h=172.17.0.3,p=... lock:0 transaction:1 changing_src:0 replicate:0 bidirectional:0 pid:17159 user:root host:slave*/;
-- salve 再次查詢數據一致
mysql> select * from testdb1.t1;
+----+-----------+
| id | val |
+----+-----------+
| 1 | hi |
| 2 | hello |
| 3 | good luck |
+----+-----------+
3 rows in set (0.00 sec)
三、主庫binlog丟失
? 主庫binlog被刪除,從庫會因為讀取日志失敗產生錯誤。要解決這個問題,通常就是找一個業務空閑期停服,然后從庫重新做主從配置。但是,有的時候考慮到數據量較大,不想重建,選擇放棄丟失binlog的這一部分數據,繼續同步。
3.1 模擬錯誤
- 停止備庫
[root@slave ~]# systemctl stop mysqld
[root@slave ~]# ps -ef|grep mysql
root 22632 17129 0 16:39 pts/1 00:00:00 grep --color=auto mysql
- 主庫操作,并刪除binlog日志
mysql> flush logs;
Query OK, 0 rows affected (0.01 sec)
mysql> delete from testdb1.t1 where id=1;
Query OK, 1 row affected (0.00 sec)
mysql> flush logs;
Query OK, 0 rows affected (0.01 sec)
mysql> delete from testdb1.t1 where id=2;
Query OK, 1 row affected (0.01 sec)
mysql> insert into t1 values (4,'hi');
Query OK, 1 row affected (0.01 sec)
mysql> select * from t1;
+----+-----------+
| id | val |
+----+-----------+
| 3 | good luck |
| 4 | hi |
+----+-----------+
2 rows in set (0.00 sec)
mysql> show master status;
+-------------------+----------+--------------+-------------------------------------------------+-----------
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-------------------+----------+--------------+-------------------------------------------------+-----------
| master-bin.000005 | 773 | | mysql,information_schema,performance_schema,sys | 2a94bc88-5018-11ed-9277-0242ac110002:1-10 |
+-------------------+----------+--------------+-------------------------------------------------+-----------
1 row in set (0.00 sec)
-- 刪除主庫binlog日志
[root@master data]# rm -rf master-bin.000003 master-bin.000004
- 從庫啟動,同步報錯
[root@slave ~]# systemctl start mysqld
[root@slave ~]# mysql -uroot -plhr
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 14
Server version: 8.0.30 MySQL Community Server - GPL
Copyright (c) 2000, 2022, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 172.17.0.2
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000003
Read_Master_Log_Pos: 197
Relay_Log_File: slave-relay-bin.000004
Relay_Log_Pos: 415
Relay_Master_Log_File: master-bin.000003
Slave_IO_Running: No
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: information_schema,performance_schema,mysql,sys
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 197
Relay_Log_Space: 635
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 13114
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Cannot replicate because the master purged required binary logs. Replicate the missing transactions from elsewhere, or provision a new slave from backup. Consider increasing the master's binary log expiration period. The GTID set sent by the slave is '2a94bc88-5018-11ed-9277-0242ac110002:1-7,
2c1a941e-5018-11ed-90e0-0242ac110003:1-9', and the missing transactions are '2a94bc88-5018-11ed-9277-0242ac110002:8''
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 80303306
Master_UUID: 2a94bc88-5018-11ed-9277-0242ac110002
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp: 221020 16:46:16
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 2a94bc88-5018-11ed-9277-0242ac110002:2-7
Executed_Gtid_Set: 2a94bc88-5018-11ed-9277-0242ac110002:1-7,
2c1a941e-5018-11ed-90e0-0242ac110003:1-9
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
1 row in set, 1 warning (0.01 sec)
ERROR:
No query specified
結論:可以看到拋出如下錯誤:
Relay_Master_Log_File: master-bin.000003 --slave正在同步master上的binlog日志名,但是已經被刪了,所以報錯
Slave_IO_Running: No
Last_IO_Errno: 13114
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Cannot replicate because the master purged required binary logs. Replicate the missing transactions from elsewhere, or provision a new slave from backup. Consider increasing the master's binary log expiration period. The GTID set sent by the slave is '2a94bc88-5018-11ed-9277-0242ac110002:1-7,
2c1a941e-5018-11ed-90e0-0242ac110003:1-9', and the missing transactions are '2a94bc88-5018-11ed-9277-0242ac110002:8''
3.2 解決辦法
- 清空原主從配置
-- 從庫操作
mysql> reset master; -- 清空從庫的 gtid
Query OK, 0 rows affected (0.01 sec)
mysql> stop slave;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> reset slave all; --清空從庫信息
Query OK, 0 rows affected, 1 warning (0.00 sec)
- 為了盡可能的少丟數據,我們檢查主庫目前最舊的binlog日志
[root@master data]# mysqlbinlog -v --base64-output=DECODE-ROWS master-bin.000005 |more
# The proper term is pseudo_replica_mode, but we use this compatibility alias
# to make the statement usable on server versions 8.0.24 and older.
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#221020 16:43:40 server id 80303306 end_log_pos 126 CRC32 0x57df434c Start: binlog v 4, server v 8.0.30 created 221020 16:43:40
# Warning: this binlog is either in use or was not closed properly.
# at 126
#221020 16:43:40 server id 80303306 end_log_pos 197 CRC32 0xebea1c88 Previous-GTIDs
# 2a94bc88-5018-11ed-9277-0242ac110002:1-8
結論:Previous-GTIDs 2a94bc88-5018-11ed-9277-0242ac110002:1-8
- 手動設置GTID_PURGED
mysql> SET @@GLOBAL.GTID_PURGED='2a94bc88-5018-11ed-9277-0242ac110002:1-8';
Query OK, 0 rows affected (0.00 sec)
mysql> show master status;
+------------------+----------+--------------+-------------------------------------------------+------------
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+-------------------------------------------------+------------
| slave-bin.000001 | 157 | | mysql,information_schema,performance_schema,sys | 2a94bc88-5018-11ed-9277-0242ac110002:1-8 |
+------------------+----------+--------------+-------------------------------------------------+------------
1 row in set (0.00 sec)
mysql> change master to
-> master_host='172.17.0.2',
-> master_port=3306,
-> master_user='repl',
-> master_password='repl',
-> master_auto_position=1;
Query OK, 0 rows affected, 8 warnings (0.02 sec)
mysql> start slave;
Query OK, 0 rows affected, 1 warning (0.01 sec)
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 172.17.0.2
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000005
Read_Master_Log_Pos: 773
Relay_Log_File: slave-relay-bin.000002
Relay_Log_Pos: 951
Relay_Master_Log_File: master-bin.000005
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: information_schema,performance_schema,mysql,sys
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 773
Relay_Log_Space: 1161
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 80303306
Master_UUID: 2a94bc88-5018-11ed-9277-0242ac110002
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 2a94bc88-5018-11ed-9277-0242ac110002:9-10
Executed_Gtid_Set: 2a94bc88-5018-11ed-9277-0242ac110002:1-10
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
1 row in set, 1 warning (0.00 sec)
ERROR:
No query specified
結論:主從同步恢復,但是因為舍棄了主庫丟失的binlog日志,導致部分同步事務丟失,數據不一致,案例中從庫t1表id=1的行數據未同步刪除,id=2的行數據被正常同步刪除。?
mysql> select * from t1;
+----+-----------+
| id | val |
+----+-----------+
| 1 | hi |
| 3 | good luck |
| 4 | hi |
+----+-----------+
3 rows in set (0.00 sec)
四、從庫日志丟失
誤刪除了從庫上的relay log,導致還有事務沒應用,發生主從故障。
4.1 模擬錯誤
- 備庫停止同步
mysql> stop slave sql_thread;
Query OK, 0 rows affected, 1 warning (0.00 sec)
- 刪除還沒有同步relay log
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 172.17.0.2
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000008
Read_Master_Log_Pos: 197
Relay_Log_File: slave-relay-bin.000005
Relay_Log_Pos: 415
Relay_Master_Log_File: master-bin.000007
Slave_IO_Running: Yes
Slave_SQL_Running: No
[root@slave data]# ll slave-relay-bin.*
-rw-r----- 1 mysql mysql 1040 Oct 24 13:19 slave-relay-bin.000004
-rw-r----- 1 mysql mysql 468 Oct 24 13:20 slave-relay-bin.000005
-rw-r----- 1 mysql mysql 298 Oct 24 13:20 slave-relay-bin.000006
-rw-r----- 1 mysql mysql 701 Oct 24 13:23 slave-relay-bin.000007
-rw-r----- 1 mysql mysql 100 Oct 24 13:20 slave-relay-bin.index
-- 刪除從庫relay log
[root@slave data]# rm -rf slave-relay-bin.000006 slave-relay-bin.000007
- 主庫繼續插入數據
mysql> insert into testdb1.t1 values (5,'hello');
Query OK, 1 row affected (0.01 sec)
mysql> flush logs;
Query OK, 0 rows affected (0.01 sec)
mysql> insert into testdb1.t1 values (6,'hello');
Query OK, 1 row affected (0.00 sec)
- 啟動slave,報錯
mysql> start slave sql_thread;
Query OK, 0 rows affected, 1 warning (0.01 sec)
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 172.17.0.2
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000009
Read_Master_Log_Pos: 483
Relay_Log_File: slave-relay-bin.000005
Relay_Log_Pos: 415
Relay_Master_Log_File: master-bin.000007
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB: information_schema,performance_schema,mysql,sys
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 13121
Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, the server was unable to fetch a keyring key required to open an encrypted relay log file, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Skip_Counter: 0
Exec_Master_Log_Pos: 197
Relay_Log_Space: 2519
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 13121
Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, the server was unable to fetch a keyring key required to open an encrypted relay log file, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
Replicate_Ignore_Server_Ids:
Master_Server_Id: 80303306
Master_UUID: 2a94bc88-5018-11ed-9277-0242ac110002
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State:
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp: 221024 13:24:33
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 2a94bc88-5018-11ed-9277-0242ac110002:9-16
Executed_Gtid_Set: 2a94bc88-5018-11ed-9277-0242ac110002:1-14
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
1 row in set, 1 warning (0.00 sec)
ERROR:
No query specified
結論:可以看到拋出如下錯誤:
Slave_SQL_Running: No
Last_Errno: 13121
Last_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log), a network problem, the server was unable to fetch a keyring key required to open an encrypted relay log file, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.
4.2 解決辦法
- 根據slave status 信息,可以得到我們從庫同步中斷,所對應的主庫位置:?
Relay_Master_Log_File: master-bin.000007 Exec_Master_Log_Pos: 197 Executed_Gtid_Set: 2a94bc88-5018-11ed-9277-0242ac110002:1-14
主要觀察Relay_Master_Log_File和Exec_Master_Log_Pos、Executed_Gtid_Set的值,代表了從庫已經處理的中繼日志文件和位置所對應的binlog文件和位置。也可以通過查詢主庫binlog信息,進一步確認上面信息是否正確。
[root@master data]# mysqlbinlog -v --base64-output=DECODE-ROWS master-bin.000007
# The proper term is pseudo_replica_mode, but we use this compatibility alias
# to make the statement usable on server versions 8.0.24 and older.
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#221024 13:18:26 server id 80303306 end_log_pos 126 CRC32 0xd75fbc8a Start: binlog v 4, server v 8.0.30 created 221024 13:18:26
# at 126
#221024 13:18:26 server id 80303306 end_log_pos 197 CRC32 0xb2e81f0f Previous-GTIDs
# 2a94bc88-5018-11ed-9277-0242ac110002:1-14
# at 197
#221024 13:20:58 server id 80303306 end_log_pos 245 CRC32 0xdd30bcb7 Rotate to master-bin.000008 pos: 4
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
- 重新配置同步
-- 從庫操作
mysql> reset master; -- 清空從庫的 gtid
Query OK, 0 rows affected (0.01 sec)
mysql> stop slave;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> reset slave all; --清空從庫信息
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> SET @@GLOBAL.GTID_PURGED='2a94bc88-5018-11ed-9277-0242ac110002:1-14';
Query OK, 0 rows affected (0.01 sec)
mysql> show master status;
+------------------+----------+--------------+-------------------------------------------------+------------
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+-------------------------------------------------+------------
| slave-bin.000001 | 157 | | mysql,information_schema,performance_schema,sys | 2a94bc88-5018-11ed-9277-0242ac110002:1-14 |
+------------------+----------+--------------+-------------------------------------------------+------------
1 row in set (0.01 sec)
mysql> change master to
-> master_host='172.17.0.2',
-> master_port=3306,
-> master_user='repl',
-> master_password='repl',
-> master_auto_position=1;
Query OK, 0 rows affected, 8 warnings (0.02 sec)
mysql> start slave;
Query OK, 0 rows affected, 1 warning (0.01 sec)
- 檢查同步正常
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for source to send event
Master_Host: 172.17.0.2
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: master-bin.000009
Read_Master_Log_Pos: 483
Relay_Log_File: slave-relay-bin.000004
Relay_Log_Pos: 701
Relay_Master_Log_File: master-bin.000009
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB: information_schema,performance_schema,mysql,sys
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 483
Relay_Log_Space: 999
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 80303306
Master_UUID: 2a94bc88-5018-11ed-9277-0242ac110002
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Replica has read all relay log; waiting for more updates
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 2a94bc88-5018-11ed-9277-0242ac110002:15-16
Executed_Gtid_Set: 2a94bc88-5018-11ed-9277-0242ac110002:1-16
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
Master_public_key_path:
Get_master_public_key: 0
Network_Namespace:
1 row in set, 1 warning (0.00 sec)
ERROR:
No query specified
-- 主庫新插入的數據也都同步過來了
mysql> select * from testdb1.t1;
+----+-----------+
| id | val |
+----+-----------+
| 1 | hi |
| 3 | good luck |
| 4 | hi |
| 5 | hello |
| 6 | hello |
+----+-----------+
5 rows in set (0.00 sec)




