這段時間做一個大數據量查詢優化時偶然發現一個和分區有關的MySQL 優化特性ICP的例子,回來翻閱一下MySQL的參考手冊,找了些數據測試了一下,對這個case有了進一步的了解,在這里分享一下。先介紹一下MySQL的ICP (Index Condition Pushdown,索引條件下推)優化特性。
1 Index Condition Pushdown,索引條件下推(ICP)
分布式數據庫里有一項存算分離技術,這項技術的一個重要的特性計算下推,將部分計算下推到存儲引擎里執行。Oracle 一體機Exadata一個重要的特性也是數據密集處理查詢offload到存儲節點的智能軟件進行。實際上,MySQL也是一個存算分離的結構,它的引擎和計算引擎也是分離的。而且MySQL的存儲引擎也執行了大量的計算操作。ICP就是一種優化計算下推的技術,它的核心思想是將WHERE條件中索引相關的過濾操作"下推"到存儲引擎層執行。傳統方式中,存儲引擎先通過索引定位記錄,然后將完整記錄返回給服務器層進行過濾;而ICP允許存儲引擎在讀取索引時就應用部分WHERE條件進行過濾。當查詢使用二級索引且WHERE條件包含索引列和非索引列時,ICP可以將索引列的過濾條件下推到存儲引擎。這樣存儲引擎在掃描索引時就能提前過濾掉不符合條件的記錄,減少回表操作和服務器層處理的數據量,顯著提升查詢性能,特別是對于組合索引和范圍查詢場景效果明顯。ICP通過減少I/O和CPU開銷來優化查詢效率。
2 測試數據
mysql> show create table sales;
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| sales | CREATE TABLE `sales` (
`id` int NOT NULL AUTO_INCREMENT,
`product_id` int NOT NULL,
`customer_id` int NOT NULL,
`sale_date` date NOT NULL,
`amount` decimal(10,2) NOT NULL,
`region` varchar(50) DEFAULT NULL,
PRIMARY KEY (`id`,`sale_date`),
KEY `idx_sales_cust_id` (`customer_id`),
KEY `idx_p_c_r` (`product_id`,`customer_id`,`region`)
) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
/*!50100 PARTITION BY RANGE (year(`sale_date`))
(PARTITION p2018 VALUES LESS THAN (2019) ENGINE = InnoDB,
PARTITION p2019 VALUES LESS THAN (2020) ENGINE = InnoDB,
PARTITION p2020 VALUES LESS THAN (2021) ENGINE = InnoDB,
PARTITION p2021 VALUES LESS THAN (2022) ENGINE = InnoDB,
PARTITION p2022 VALUES LESS THAN (2023) ENGINE = InnoDB,
PARTITION pmax VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
表里的數據
mysql> select * from sales limit 5;
+----+------------+-------------+------------+--------+-----------+
| id | product_id | customer_id | sale_date | amount | region |
+----+------------+-------------+------------+--------+-----------+
| 7 | 254 | 1955 | 2018-03-16 | 895.48 | North |
| 28 | 298 | 3207 | 2018-11-03 | 970.72 | South |
| 56 | 352 | 4006 | 2018-01-20 | 139.03 | Southwest |
| 57 | 168 | 1631 | 2018-09-17 | 895.66 | South |
| 61 | 335 | 1238 | 2018-06-27 | 959.53 | West |
+----+------------+-------------+------------+--------+-----------+
5 rows in set (0.00 sec)
3 ICP應用的普通場景
ICP只能應用于使用輔助索引訪問表的場景,不能用于主鍵訪問表的場景。其典型的場景是表上創建了聯合索引,但卻不能充分利用聯合索引的情況下。以sales表的聯合索引idx_sales_cust_id為例,查詢where條件包含product_id列,但沒有customer_id列,或者有customer_id,但region列是like ‘%th’。這樣語句只能根據product_id列查詢索引后回到MySQL層面回表后應用其它查詢條件。ICP優化的目的是將customer_id和region的條件下推到存儲引擎在索引查找時執行,減少回表次數。比如,現在要查詢的時id為254的產品在North區域的銷售信息,sql語句如下
select * from sales where product_id=254 and region=‘North’;
在開啟和未開啟ICP時語句的實際執行計劃(MySQL 8.0 可以用explain analyze獲得)對比如下:
--ICP開啟(默認情況下)
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Index lookup on sales using idx_p_c_r (product_id=254), with index condition: (sales.region = 'North') (cost=73.8 rows=284) (actual time=0.0631..0.254 rows=33 loops=1)|
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
--關閉ICP
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (sales.region = 'North') (cost=73.8 rows=28.4) (actual time=0.25..1.41 rows=33 loops=1)
-> Index lookup on sales using idx_p_c_r (product_id=254) (cost=73.8 rows=284) (actual time=0.242..1.34 rows=284 loops=1)|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
首先注意到的是不論ICP是否開啟,MySQL優化器對語句評估的成本相同(cost=73.8),對返回行數的評估也都是不準確的,這可能意味著如果有其它可行執行計劃,評估成本低于73.8,優化器會選擇其它的執行計劃,社區內有一篇文章技術分享 | SQL 優化:ICP 的缺陷對此有詳細的說明,即ICP不會影響執行計劃的選擇,只會影響已被選擇的執行計劃的執行。ICP開啟時(這個特性MySQL數據庫默認是開啟的),語句執行了ICP,使用索引idx_p_c_r,ICP的條件是sales.region = ‘North’,這個語句實際執行時返回了33行,執行時間0.06秒;當關閉ICP時,語句執行的ref操作,分為兩個步驟,首先查詢索引idx_p_c_r,索引的查詢條件是product_id=254,這個操作優化器的評估和實際執行都是準確的,評估和實際返回的行數都是284行,實際的執行時間也較長,達到了0.24秒以上,然后查詢sales表,根據條件對索引返回的行進行過濾),索引返回了284行,這一步左就要對根據索引返回的主鍵對sales表進行284此查詢并過濾。執行完這兩部后,實際返回33行,消耗時間0.25秒。從這兩個執行計劃來看,ICP的執行效率回表次數更少,效率更高,在where條件選擇性較高的情況下效率接近覆蓋索引掃描。
4 分區場景下ICP的應用
在這次sql優化中,偶然發現ICP不但可以用于聯合索引的場景,有時也可以用于單列索引的場景,這次是一個分區的場景。還是用之前的數據,運行下面的SQL
select * from sales where sale_date=‘2018-03-16’ and customer_id=1631;
在ICP開啟的情況下語句實際的執行計劃如下:
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Index lookup on sales using idx_sales_cust_id (customer_id=1631), with index condition: (sales.sale_date = DATE'2018-03-16') (cost=1.82 rows=7) (actual time=0.0159..0.0159 rows=0 loops=1)|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
這個執行計劃明顯優于ICP關閉時的執行計劃
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (sales.sale_date = DATE'2018-03-16') (cost=1.82 rows=0.7) (actual time=0.192..0.192 rows=0 loops=1)
-> Index lookup on sales using idx_sales_cust_id (customer_id=1631) (cost=1.82 rows=7) (actual time=0.0636..0.104 rows=7 loops=1)|
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
idx_sales_cust_id是創建在cust_id列上的單列索引,為什么這里ICP也會生效呢?理解了MySQL索引原理后,略加思索便會對個中道理了然于胸了。idx_sales_cust_id存儲的值是sales表的主鍵,sales表的主鍵是聯合主鍵(id,sale_date),包含sale_date列,這樣,索引idx_sales_cust_id里也包含了sale_date列的信息,ICP也就可以根據這個信息進行過濾了。明白了這個道理,將其推廣一下,對于聯合主鍵的,查詢條件中只要包含主鍵列,就可以利用單列索引進行ICP優化了。下面是一個簡單的例子
測試數據
mysql> show indexes from weatherdata;
+-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| weatherdata | 0 | PRIMARY | 1 | log_date | A | 4017 | NULL | NULL | | BTREE | | | YES | NULL |
| weatherdata | 0 | PRIMARY | 2 | time | A | 1155084 | NULL | NULL | | BTREE | | | YES | NULL |
| weatherdata | 0 | PRIMARY | 3 | time | A | 4623086 | NULL | NULL | | BTREE | | | YES | NULL |
| weatherdata | 1 | idx_wd | 1 | weather | A | 9 | NULL | NULL | YES | BTREE | | | YES | NULL |
+-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
4 rows in set (0.01 sec)
表weatherdata除了聯合主鍵(log_date,time,weather)外還有一個weather列上的輔助索引idx_wd,下面是一條利用這個輔助索引查詢數據的例子
mysql> explain analyze select * from sales where sale_date='2018-03-16' and customer_id=4799;
mysql> explain analyze select * from weatherdata where weather='Nebel-Regen' and station =1;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Index lookup on weatherdata using idx_wd (weather='Nebel-Regen'), with index condition: (weatherdata.station = 1) (cost=12910 rows=255176) (actual time=12.5..85.6 rows=32455 loops=1)
|
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.10 sec)
語句的執行計劃依然執行了ICP。
5 一點體會
MySQL數據庫在查詢上的優化選項還是比較多的,很多優化選項默認都是打開的,這些選項的作用很容易被忽視,這個小case可以加深對ICP優化特性的理解。




