前言
前兩天看到 Kevin崔 的文章(下面鏈接,轉(zhuǎn)發(fā)請(qǐng)注明),提到MySQL主從數(shù)據(jù)不一致的幾種情況,里面提到MySQL自身bug的原因,但是沒(méi)展開(kāi)。
MySQL 主從數(shù)據(jù)不一致有哪些情況:http://www.sunline.cc/db/500145
想到之前碰到過(guò)的案例,高度懷疑是MySQL BUG導(dǎo)致,記錄一下。
MySQL基于行的復(fù)制Row-Based Replication (RBR) 下,如果一張沒(méi)有主鍵的表被并發(fā)更新,過(guò)一段時(shí)間新增一個(gè)自增字段作為主鍵,會(huì)導(dǎo)致主從該表數(shù)據(jù)不一致。
1 環(huán)境信息
RDS MySQL:5.7.302 錯(cuò)誤描述
業(yè)務(wù)反饋生產(chǎn)RDS數(shù)據(jù)庫(kù)有張表,主從數(shù)據(jù)不一致。
主庫(kù):
mysql> select * from t1 where v_template_id='1031';
+-------+---------------------+--------+
| id | v_template_id | sku_id |
+-------+---------------------+--------+
| 70238 | 1031 | 23363 |
+-------+---------------------+--------+
1 row in set (10.87 sec)從庫(kù)1和從庫(kù)2和從庫(kù)3:
mysql> select * from t1 where v_template_id='1031';
+-------+---------------------+--------+
| id | v_template_id | sku_id |
+-------+---------------------+--------+
| 71294 | 1031 | 23363 |
+-------+---------------------+--------+
1 row in set (0.01 sec)可以看到,三個(gè)從庫(kù)t1表的數(shù)據(jù)和主庫(kù)不一致。比如上面 v_template_id='1031',對(duì)應(yīng)主鍵ID不一致。
3 排查
經(jīng)過(guò)統(tǒng)計(jì),這種情況大約有上千條。
mysql> show create table t1\G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`v_template_id` bigint(16) NOT NULL,
`sku_id` bigint(16) NOT NULL,
PRIMARY KEY (`id`),
KEY `sku_id` (`sku_id`)
) ENGINE=InnoDB AUTO_INCREMENT=47627822 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)4 原因
后面推測(cè)是MySQL BUG導(dǎo)致。
4.1 bug名稱(chēng)
Bug #92949 : add auto_increment column as PK cause RBR replication inconsistentMySQL BUG鏈接:https://bugs.mysql.com/bug.php?id=92949
4.2 bug描述
Description:
This issue comes from one of our customers.
Under RBR replication, If a table without PK is concurrently updated, and an auto_increment column is added to this table as PK after a while of updating. The table on master and slave will be inconsistent with each other.4.3 復(fù)現(xiàn)
## Use two concurrent connecitons on master
## Insert in each connection, and make sure the insert order is different
## from commit order.步驟1 主庫(kù)會(huì)話1操作
create table t1(tid int, name varchar(30), index idx_tid(tid)) engine=InnoDB;
begin;
insert into t1 values (1, 'fun');步驟2 主庫(kù)會(huì)話2操作
begin;
insert into t1 values (2, 'go');
commit;步驟3 主庫(kù)會(huì)話1操作
commit;步驟4 主庫(kù)會(huì)話1和從庫(kù)會(huì)話操作
## 主庫(kù)會(huì)話1
[root@localhost] {16:28:51} (test) [13]> select * from t1;
## 從庫(kù)會(huì)話
[root@localhost] {16:32:09} (test) [13]> select * from t1;
步驟5 主庫(kù)會(huì)話1操作
alter table t1 add column id int not null auto_increment primary key;步驟6 主庫(kù)會(huì)話1和從庫(kù)會(huì)話操作
## 主庫(kù)會(huì)話1
[root@localhost] {16:28:51} (test) [13]> select * from t1;
## 從庫(kù)會(huì)話
[root@localhost] {16:32:09} (test) [13]> select * from t1;
步驟7 可模擬更多的delete和update操作
第4步的時(shí)候,主從查到的數(shù)據(jù)條數(shù)一致,但是順序不一致;第5步新增自增主鍵id后,第6步主從上查到的數(shù)據(jù)已經(jīng)不一樣了。tid相當(dāng)于生產(chǎn)案例中的v_template_id,此時(shí)已經(jīng)復(fù)現(xiàn)了BUG。
4.4 原因
Suggested fix:
I think there are 2 points that cause this inconsistency.
1. For table without PK, the internal cluster index is ordered as insert order. In the above scenario, master and slave have different order, so the data is organized in different order, which is not a big deal in normal case.
But when the auto_incremental column is added, and PK is create on this column, the whole table is rebuild. The auto_increment value is assigned monotonically increasing, corresponding to the internal cluster index order. So the same auto_increment PK value mapped to different business data.
2. For RBR replication, if PK exists, Update/Delete row lookup is based on PK values, they other columns are not considered, as long as we can fetch data from engine according PK.
I'm not sure whether this is a bug or incorrect usage of MySQL.
But I think we can do some improvements to the RBR row lookup logic, if we found match PK, but they other fields are mismatching, an error or warning should be emitted in error log to notify user about such inconsistency.5 解決方案
生產(chǎn)問(wèn)題已經(jīng)出現(xiàn),只能手動(dòng)修復(fù)數(shù)據(jù)。
對(duì)于無(wú)主鍵的表,如果要新增自增主鍵,需要按下面方法操作
# Instead of ALTER TABLE t ADD COLUMN c INT NOT NULL AUTO_INCREMENT PRIMARY KEY:
ALTER TABLE t ADD COLUMN c INT NOT NULL;
SET @c = 0;
UPDATE t SET c = (@c := @c + 1);
ALTER TABLE t ADD PRIMARY KEY(c);
ALTER TABLE t CHANGE c c INT NOT NULL AUTO_INCREMENT;
The update will generate row events, so slave will apply this deterministically.




