技術分享 | SQL 優化：ICP 的缺陷

愛可生開源社區 2022-12-01

670

作者：胡呈清

愛可生 DBA 團隊成員，擅長故障分析、性能優化，個人博客：https://www.jianshu.com/u/a95ec11f67a8，歡迎討論。

本文來源：原創投稿

*愛可生開源社區出品，原創內容未經授權不得隨意使用，轉載請聯系小編并注明來源。

什么是 ICP（Index Condition Pushdown）

ICP 全稱 Index Condition Pushdown，也就是常說的索引條件下推，在之前的一篇文章中介紹過它：explain 執行計劃詳解2--Extra

使用二級索引查找數據時，where 子句中屬于索引的一部分但又無法使用索引的條件，MySQL 會把這部分條件下推到存儲引擎層，篩選之后再進行回表，這樣回表的次數就減少了。

比如有這樣一個索引idx_test(birth_date,first_name,hire_date)
查詢語句select * from employees where birth_date >= '1957-05-23' and birth_date <='1960-06-01' and hire_date>'1998-03-22';
的執行過程：

1. 根據 birth_date >= '1957-05-23' and birth_date <='1960-06-01'
這個條件從 idx_test 索引中查找數據，假設返回數據 10萬行；

2. 查找出來的10萬行數據包含 hire_date 字段，MySQL 會把 hire_date >'1998-03-22'
這個條件下推到存儲引擎，進一步篩選數據，假設還剩1000行；

3. 由于要查詢所有字段的值，而前面查到的 1000 行數據只包含 birth_date,first_name,hire_date 三個字段，所以需要回表查出所有字段的值。回表的過程就是將這 1000 行數據的主鍵值拿出來，一個一個到主鍵索引上去查找（也可以開啟 mrr，拿一批主鍵值回表），回表次數是 1000。如果沒有ICP，則回表次數是 10 萬。

很顯然在執行階段 ICP 可以減少回表的次數，在基于代價的優化器中，也就是能減少執行的成本。但是，優化器在優化階段選擇最優的執行計劃時真的能考慮到 ICP 可以減少成本嗎？下面我們通過一個實驗來回答這個問題。

實驗

先準備一些數據，下載 Employees Sample Database 并導入到 MySQL 中：https://dev.mysql.com/doc/employee/en/employees-installation.html

還是上面那個例子，創建一個組合索引：

alter table employees add index idx_test(birth_date,first_name,hire_date);

執行下面這個SQL：

SELECT *
FROM employees
WHERE birth_date >= '1957-05-23'
    AND birth_date <= '1960-06-01'
    AND hire_date > '1998-03-22';

執行計劃如下：

mysql [localhost:5735] {msandbox} (employees) > explain select * from employees where birth_date >= '1957-05-23' and birth_date <='1960-06-01' and hire_date>'1998-03-22';
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | employees | NULL       | ALL  | idx_test      | NULL | NULL    | NULL | 298980 |    15.74 | Using where |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------------+

可以看到并沒有使用 idx_test
索引，但如果加 hint 強制走 idx_test
索引，我們知道可以使用 ICP，執行計劃如下：

mysql [localhost:5735] {msandbox} (employees) > explain select * from employees force index(idx_test) where birth_date >= '1957-05-23' and birth_date <='1960-06-01' and hire_date>'1998-03-22';
+----+-------------+-----------+------------+-------+---------------+----------+---------+------+--------+----------+-----------------------+
| id | select_type | table     | partitions | type  | possible_keys | key      | key_len | ref  | rows   | filtered | Extra                 |
+----+-------------+-----------+------------+-------+---------------+----------+---------+------+--------+----------+-----------------------+
|  1 | SIMPLE      | employees | NULL       | range | idx_test      | idx_test | 3       | NULL | 141192 |    33.33 | Using index condition |
+----+-------------+-----------+------------+-------+---------------+----------+---------+------+--------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)

再讓我們打開 slow log 看下真實的執行效率：

全表掃描需要掃描 300024 行，執行時間 0.15 秒
走 idx_test 索引需要掃描 141192 行（Rows_examined: 1065 是個 bug，這顯然不是掃描行數，掃描行數我們可以從執行計劃看出，在這個例子中執行計劃里的 rows 是真實的掃描行數，不是估算值，這個知識點不影響理解本文）。因為沒有其他條件，從返回結果行數我們也能知道回表次數就是 1065，執行時間只要 0.037 秒

# Time: 2022-11-24T18:02:01.001734+08:00
# Query_time: 0.146939  Lock_time: 0.000850 Rows_sent: 1065  Rows_examined: 300024
SET timestamp=1669284095;
select * from employees where birth_date >= '1957-05-23' and birth_date <='1960-06-01' and hire_date>'1998-03-22';
# Time: 2022-11-24T18:01:09.001223+08:00
# Query_time: 0.037211  Lock_time: 0.001649 Rows_sent: 1065  Rows_examined: 1065
SET timestamp=1669284032;
select * from employees force index(idx_test) where birth_date >= '1957-05-23' and birth_date <='1960-06-01' and hire_date>'1998-03-22';

很顯然走 idx_test 索引比全表掃描效率更高，那為什么優化器不選擇走 idx_test 索引呢？一個不會犯錯的說法是優化器有它的算法，并不以人類認為的時間快慢為標準來進行選擇。這次我們打破砂鍋問到底，優化器的算法是什么？

答案是成本，優化器在選擇最優的執行計劃時會計算所有可用的執行計劃的成本，然后選擇成本最小的那個。而成本有明確的計算方法，也能通過 explain format=json 展示執行計劃的成本，因此我們用這一點來證明 ICP 能否影響執行計劃的成本。關于 explain format=json 的詳細輸出解釋可以參考：explain format=json 詳解，本文不過多展開。

成本計算

1. I/O成本

表的數據和索引都存儲到磁盤上，當我們想查詢表中的記錄時，需要先把數據或者索引加載到內存中然后再操作。這個從磁盤到內存這個加載的過程損耗的時間稱之為I/O成本。

2. CPU成本

讀取以及檢測記錄是否滿足對應的搜索條件、對結果集進行排序等這些操作損耗的時間稱之為CPU成本。

3. 成本常數

對于InnoDB存儲引擎來說，頁是磁盤和內存之間交互的基本單位，MySQL5.7 中規定讀取一個頁面花費的成本默認是1.0，讀取以及檢測一條記錄是否符合搜索條件的成本默認是0.2。1.0、0.2這些數字稱之為成本常數（不同版本可能不一樣，可以通過 mysql.server_cost、mysql.engine_cost 查看）。

不加干涉時，優化器選擇全表掃描，總成本為 "query_cost": "60725.00"，計算公式：

IO成本：929*1 = 929 （929 是主鍵索引的頁數，通過表的統計信息中的 Data_length/pagesize 得到）
CPU 成本：298980*0.2 = 59796（298980是掃描行數，全表掃描時這是一個估算值，也就是表的統計信息中的 Rows）
總成本 = IO成本 + CPU 成本 = 929 + 59796 = 60725

mysql [localhost:5735] {msandbox} (employees) > explain format=json select * from employees  where birth_date >= '1957-05-23' and birth_date <='1960-06-01' and hire_date>'1998-03-22'\G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "60725.00"
    },
    "table": {
      "table_name": "employees",
      "access_type": "ALL",
      "possible_keys": [
        "idx_test"
      ],
      "rows_examined_per_scan": 298980,
      "rows_produced_per_join": 47059,
      "filtered": "15.74",
      "cost_info": {
        "read_cost": "51313.14",
        "eval_cost": "9411.86",
        "prefix_cost": "60725.00",
        "data_read_per_join": "6M"
      },
      "used_columns": [
        "emp_no",
        "birth_date",
        "first_name",
        "last_name",
        "gender",
        "hire_date"
      ],
      "attached_condition": "((`employees`.`employees`.`birth_date` >= '1957-05-23') and (`employees`.`employees`.`birth_date` <= '1960-06-01') and (`employees`.`employees`.`hire_date` > '1998-03-22'))"
    }
  }
}
1 row in set, 1 warning (0.00 sec)

hint 走 idx_test 索引時，總成本為 "query_cost": "197669.81"，計算公式：

訪問 idx_test 索引的成本：

IO 成本=1*1=1（優化器認為讀取索引的一個范圍區間的I/O成本和讀取一個頁面是相同的，而條件中只有 birth_date >= '1957-05-23' and birth_date <='1960-06-01' 這一個范圍）
CPU 成本 = 141192*0.2 = 28238.4（掃描行數 "rows_examined_per_scan": 141192）

回表的成本（不會考慮索引條件下推的作用，因此回表次數等于索引掃描行數）：

回表 IO 成本 = 141192*1 = 141192
回表 CPU 成本 = 141192*0.2 = 28238.4

總成本：1+28238.4+141192+28238.4=197669.8

mysql [localhost:5735] {msandbox} (employees) > explain format=json select * from employees force index(idx_test) where birth_date >= '1957-05-23' and birth_date <='1960-06-01' and hire_date>'1998-03-22'\G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "197669.81"
    },
    "table": {
      "table_name": "employees",
      "access_type": "range",
      "possible_keys": [
        "idx_test"
      ],
      "key": "idx_test",
      "used_key_parts": [
        "birth_date"
      ],
      "key_length": "3",
      "rows_examined_per_scan": 141192,
      "rows_produced_per_join": 47059,
      "filtered": "33.33",
      "index_condition": "((`employees`.`employees`.`birth_date` >= '1957-05-23') and (`employees`.`employees`.`birth_date` <= '1960-06-01') and (`employees`.`employees`.`hire_date` > '1998-03-22'))",
      "cost_info": {
        "read_cost": "188257.95",
        "eval_cost": "9411.86",
        "prefix_cost": "197669.81",
        "data_read_per_join": "6M"
      },
      "used_columns": [
        "emp_no",
        "birth_date",
        "first_name",
        "last_name",
        "gender",
        "hire_date"
      ]
    }
  }
}
1 row in set, 1 warning (0.00 sec)

結論

從上一步的成本結果來看，全表掃描的成本是 60725，而走 idx_test 索引的成本是 197669.81，因此優化器選擇全表掃描。

實際上 ICP 可以減少回表次數，走 idx_test 索引時的真實回表次數是 1065，成本應該是：

IO成本：1065*1 = 1065
CPU成本：1065*0.2 = 213

但是優化器在計算回表成本時，顯然沒有考慮 ICP，直接將掃描索引的行數 141192 當作了回表的次數，所以得到的回表成本巨大，總成本遠遠大于全表掃描的成本。

因此，我們可以得到的結論是：ICP可以在執行階段提高執行效率，但是在優化階段并不能改善執行計劃。

本文關鍵字：#sql優化# #ICP# #索引條件下推#

文章推薦：

技術分享 | MySQL：max_allowed_packet 影響了什么？

技術分享 | MySQL:caching_sha2_password 快速問答

技術分享 | MySQL : SSL 連接淺析

關于SQLE

愛可生開源社區的 SQLE 是一款面向數據庫使用者和管理者，支持多場景審核，支持標準化上線流程，原生支持 MySQL 審核且數據庫類型可擴展的 SQL 審核工具。

如何獲取

類型	地址
版本庫	https://github.com/actiontech/sqle
文檔	https://actiontech.github.io/sqle-docs-cn/
發布信息	https://github.com/actiontech/sqle/releases
數據審核插件開發文檔	https://actiontech.github.io/sqle-docs-cn/3.modules/3.7_auditplugin/auditplugin_development.html

更多關于 SQLE 的信息和交流，請加入官方QQ交流群：637150065...

文章轉載自愛可生開源社區，如果涉嫌侵權，請發送郵件至：contact@modb.pro進行舉報，并提供相關證據，一經查實，墨天輪將立刻刪除相關內容。

特色一级强游戏,海奥华预言免费阅读,51漫画兑换码,美女裸体无遮挡永久免费观看网站,lubuntu线路检测入口

技術分享 | SQL 優化：ICP 的缺陷

什么是 ICP（Index Condition Pushdown）

實驗

成本計算

結論

評論