
大家好, 今天和大家分享是的PG的執行計劃相關的方面內容。
和眾所周知的數據庫ORACLE,MYSQL 一樣,PG 的 優化器也是基于CBO的成本計算,來生成理論上是最佳的執行計劃。
不同的數據庫,同樣的 explain 命令, 給你帶來執行計劃的詳細輸出:
dbtest@[local:/tmp]:1992=#105846 create table tab (id int , name varchar(200));
CREATE TABLE
dbtest@[local:/tmp]:1992=#105846 insert into tab values (generate_series(1,10000),'PG execution plan');
INSERT 0 10000
dbtest@[local:/tmp]:1992=#105846 \timing
Timing is on.
dbtest@[local:/tmp]:1992=#105846 explain select * from tab;
QUERY PLAN
----------------------------------------------------------
Seq Scan on tab (cost=0.00..164.00 rows=10000 width=22)
(1 row)
Time: 0.566 ms
explain 命令 后面可以跟隨不同的參數, 含義如下:
EXPLAIN [ ( option [, ...] ) ] statement
ANALYZE [ boolean ] : 通過實際執行SQL 來獲得真實的執行計劃,每一步返回的耗時時間和行數都是真實的
dbtest@[local:/tmp]:1992=#105846 explain analyze select * from tab;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on tab (cost=0.00..164.00 rows=10000 width=22) (actual time=0.008..0.565 rows=10000 loops=1)
Planning Time: 0.035 ms
Execution Time: 0.870 ms
(3 rows)
Time: 1.221 ms
VERBOSE [ boolean ]: 輸出更為詳細的信息:比如 Query Identifier 這個重要的屬性 類似于mysql 的SQL digest 或者是 oracle 的SQL_ID
這個 Query Identifier 與 pg_stat_statements 插件里面是 一樣的
dbtest@[local:/tmp]:1992=#105846 explain analyze verbose select * from tab;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Seq Scan on public.tab (cost=0.00..164.00 rows=10000 width=22) (actual time=0.009..0.731 rows=10000 loops=1)
Output: id, name
Query Identifier: 4997534032644374154
Planning Time: 0.038 ms
Execution Time: 1.123 ms
(5 rows)
Time: 1.499 ms
COSTS [ boolean ]: 顯示 cost 成本, 這個默認就是 打開的
BUFFERS [ boolean ]: 顯示內存以及磁盤的讀寫情況 , Buffers: shared hit=64 表示 內存中的 64個 page 全部命中, 直接從磁盤查出數據
(select pg_size_pretty(pg_relation_size(‘tab’)); 512 kB/8kB = 64 pages )
dbtest@[local:/tmp]:1992=#105846 explain (analyze true , verbose true ,buffers true ) select * from tab;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Seq Scan on public.tab (cost=0.00..164.00 rows=10000 width=22) (actual time=0.017..2.605 rows=10000 loops=1)
Output: id, name
Buffers: shared hit=64
Query Identifier: 4997534032644374154
Planning Time: 0.087 ms
Execution Time: 4.117 ms
(6 rows)
Time: 4.695 ms
dbtest@[local:/tmp]:1992=#105846 select pg_size_pretty(pg_relation_size('tab'));
pg_size_pretty
----------------
512 kB
(1 row)
Time: 0.315 ms
WAL [ boolean ]:對WAL 日志寫入信息的統計。一般與analyze 聯合使用,達到SQL真實運下,WAL的信息準確性的目的。 這個參數是在PG 13版本引入的。
我們測試一下,插入100萬的數據產生的WAL 日志的大?。? WAL: records=1000000 bytes=69000000 大致是65M
dbtest@[local:/tmp]:1992=#113927 select 69000000/1024/1024 as "WAL size(MB)";
WAL size(MB)
--------------
65
(1 row)
dbtest@[local:/tmp]:1992=#113927 create table tab2(id int, name varchar(200));
CREATE TABLE
dbtest@[local:/tmp]:1992=#113927 explain (analyze,wal) insert into tab2 values (generate_series(1,1000000),'hello PG!');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Insert on tab2 (cost=0.00..5000.02 rows=0 width=0) (actual time=939.011..939.012 rows=0 loops=1)
WAL: records=1000000 bytes=69000000
-> ProjectSet (cost=0.00..5000.02 rows=1000000 width=422) (actual time=0.003..93.902 rows=1000000 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
Planning Time: 0.046 ms
Execution Time: 939.047 ms
(6 rows)
FORMAT { TEXT | XML | JSON | YAML }: 嘗試輸出格式的多樣性, 支持 YAML 這個格式確實是有點驚艷
dbtest@[local:/tmp]:1992=#105846 explain (analyze true , verbose true ,buffers true, format YAML ) select * from tab;
QUERY PLAN
-----------------------------------------
- Plan: +
Node Type: "Seq Scan" +
Parallel Aware: false +
Async Capable: false +
Relation Name: "tab" +
Schema: "public" +
Alias: "tab" +
Startup Cost: 0.00 +
Total Cost: 228.00 +
Plan Rows: 10000 +
Plan Width: 17 +
Actual Startup Time: 0.054 +
Actual Total Time: 0.862 +
Actual Rows: 10000 +
Actual Loops: 1 +
Output: +
- "id" +
- "name" +
Execution Time: 1.392
(1 row)
上面是簡單的介紹了一下 explain 的選項參數, 我們接下來看一下輸出的信息含義
cost 成本分為:起始成本 和 總成本 cost=0.00…164.00
返回的行數: rows = 10000
width : 返回的列的寬度 width=17
actual time=0.073…1.135 SQL解析的時間: 0.073, SQL 的總時間:1.135
實際返回的行數: rows=10000
循環的次數: loops=1 單表查詢所以循環的次數是 1
dbtest@[local:/tmp]:1992=#121533 explain analyze select * from tab ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on tab (cost=0.00..164.00 rows=10000 width=22) (actual time=0.008..0.641 rows=10000 loops=1)
Planning Time: 0.036 ms
Execution Time: 0.946 ms
(3 rows)
接下來我們看一下 cost 相關的成本是如何計算的?
計算公式來源于官方文檔 : (disk pages read * seq_page_cost) + (rows scanned * cpu_tuple_cost).
SELECT relpages, reltuples FROM pg_class WHERE relname = ‘tab’; --得到 64個 page 和 10000 個元祖
seq_page_cost = 1 , cpu_tuple_cost = 0.1
(disk pages read * seq_page_cost) + (rows scanned * cpu_tuple_cost) = 64 * 1 + 10000 * 0.01 = 164
和 (cost=0.00…164.00 rows=10000 width=22) 是相互吻合的
dbtest@[local:/tmp]:1992=#121533 SELECT relpages, reltuples FROM pg_class WHERE relname = 'tab';
relpages | reltuples
----------+-----------
64 | 10000
(1 row)
dbtest@[local:/tmp]:1992=#121533 show seq_page_cost;
seq_page_cost
---------------
1
(1 row)
dbtest@[local:/tmp]:1992=#121533 show cpu_tuple_cost;
cpu_tuple_cost
----------------
0.01
(1 row)
如果是帶有 where 過濾條件的呢? 成本的計算公式是 在原有的cost 基礎之上 + Filter 的成本 (cpu_operator_cost * rows)
cpu_operator_cost : 默認是 0.0025
rows : pg_class 表中的 reltuples 屬性 是 10000
原有的cost : (disk pages read * seq_page_cost) + (rows scanned * cpu_tuple_cost) = 64 * 1 + 10000 * 0.01 = 164
Filter 的成本是 cpu_operator_cost * rows = 10000 +* 0.0025 = 25
所以總的cost 是 164 + 25 = 189
dbtest@[local:/tmp]:1992=#113927 explain analyze select * from tab where name ~ 'test%';
QUERY PLAN
--------------------------------------------------------------------------------------------------
Seq Scan on tab (cost=0.00..189.00 rows=1 width=22) (actual time=10.776..10.777 rows=0 loops=1)
Filter: ((name)::text ~ 'test%'::text)
Rows Removed by Filter: 10000
Planning Time: 0.132 ms
Execution Time: 10.797 ms
(5 rows)
dbtest@[local:/tmp]:1992=#121533 show cpu_operator_cost
dbtest-# ;
cpu_operator_cost
-------------------
0.0025
(1 row)
上面我們簡單的了解一下 cost 是如何計算的, 接下來我們看一下執行計劃中表的訪問方式和表之間的連接方式:
表的訪問方式:
1)Sequential Scan 全表掃描
2) Index Scan 索引掃描
3) Index Only Scan 覆蓋索引掃描
4) Bitmap Heap Scan 索引位圖掃描
表與表的連接方式:
- Nested Loops 嵌套循環查詢連接
- Merge Join 連接
3)Hash Join 連接
Sequential Scan 全表掃描, 一般是發生在沒有可能觸發索引(或者是索引選擇率很差)的情況下,
一般適合超級小表, 或者在OLAP 分析場景下,需要掃描大量數據
執行計劃信息: Seq Scan on 表名
dbtest@[local:/tmp]:1992=#113927 explain analyze select * from tab2;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Seq Scan on tab2 (cost=0.00..15406.00 rows=1000000 width=14) (actual time=0.007..71.424 rows=1000000 loops=1)
Planning Time: 0.145 ms
Execution Time: 109.381 ms
(3 rows)
Index Scan 索引掃描,一般是發生在沒有可能觸發索引(或者是索引選擇率很高,一般在5%一下)的情況下,
適合OLTP 高并發場景,必須毫秒級別返回數據
dbtest@[local:/tmp]:1992=#113927 create index concurrently idx_name_tab2 on tab2(name);
CREATE INDEX
dbtest@[local:/tmp]:1992=#113927 create index concurrently idx_name_tab2 on tab2(name);
CREATE INDEX
dbtest@[local:/tmp]:1992=#113927 explain analyze select * from tab2 where name = 'jason' limit 10;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..4.44 rows=1 width=14) (actual time=0.015..0.015 rows=0 loops=1)
-> Index Scan using idx_name_tab2 on tab2 (cost=0.42..4.44 rows=1 width=14) (actual time=0.014..0.014 rows=0 loops=1)
Index Cond: ((name)::text = 'jason'::text)
Planning Time: 0.182 ms
Execution Time: 0.030 ms
(5 rows)
Index Only Scan 覆蓋索引掃描, 一般發生在 select 的列信息包含在索引之中。 這里值得注意的是和MYSQL 不同, PG 由于特殊的MVCC機制, 如果vacuum 不及時的話,
覆蓋索引依然會回表查詢來進行驗證。 能否觸發 真正的index only scan 還需要看 visibility map 中的 bit 位圖的信息。
visibility map這塊可以參考之前的文章: https://cdn.modb.pro/db/447177
執行計劃中 Heap Fetches: 0 表示沒有回表取數據, 存在2種情況:
1)通過索引判斷真的沒有數據,所以不需要回表
2)通過索引判斷真的有數據,再一次根據VM 判斷,全部元祖是新的,所以從索引中就能獲得最新的數據, 所以不需要回表 \
dbtest@[local:/tmp]:1992=#113927 explain analyze select name from tab2 where name = 'hello PG!!' limit 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
---
Limit (cost=0.42..4.44 rows=1 width=10) (actual time=0.028..0.029 rows=0 loops=1)
-> Index Only Scan using idx_name_tab2 on tab2 (cost=0.42..4.44 rows=1 width=10) (actual time=0.027..0.028 rows=0 loops=
1)
Index Cond: (name = 'hello PG!!'::text)
Heap Fetches: 0
Planning Time: 0.070 ms
Execution Time: 0.044 ms
(6 rows)
Bitmap Heap Scan 索引位圖掃描 這個一般發生在觸發索引存在 or 條件的情況下, 建立一張 bitmap 來尋找想要的記錄
dbtest@[local:/tmp]:1992=#113927 explain analyze select name from tab2 where name = 'hello oracle' or name = 'hello mysql' ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tab2 (cost=8.87..12.88 rows=1 width=10) (actual time=0.046..0.047 rows=0 loops=1)
Recheck Cond: (((name)::text = 'hello oracle'::text) OR ((name)::text = 'hello mysql'::text))
-> BitmapOr (cost=8.87..8.87 rows=1 width=0) (actual time=0.045..0.045 rows=0 loops=1)
-> Bitmap Index Scan on idx_name_tab2 (cost=0.00..4.43 rows=1 width=0) (actual time=0.035..0.035 rows=0 loops=1)
Index Cond: ((name)::text = 'hello oracle'::text)
-> Bitmap Index Scan on idx_name_tab2 (cost=0.00..4.43 rows=1 width=0) (actual time=0.008..0.008 rows=0 loops=1)
Index Cond: ((name)::text = 'hello mysql'::text)
Planning Time: 0.081 ms
Execution Time: 0.086 ms
(9 rows)
我們再看一下,表與表的連接方式:
Nested Loops 嵌套循環查詢連接 基本上和ORACLE的 nested loop 是無差別的, 適合大小表連接,小表作為驅動表,以觸發索引訪問的方式來訪問大表。
當然這里說的小表不一定是表本身就是小表,也有可能是經過索引過濾后的相對較小的結果集。
適合OLTP 場景, 毫秒級返回少量數據
dbtest@[local:/tmp]:1992=#113927 create table tt1 (id int, name varchar(200), pid int);
CREATE TABLE
dbtest@[local:/tmp]:1992=#113927 create table tt2 (id int, name varchar(200));
CREATE TABLE
dbtest@[local:/tmp]:1992=#113927 insert into tt1 values (generate_series(1,1000),'hello pg',generate_series(1,1000));
INSERT 0 1000
dbtest@[local:/tmp]:1992=#113927 insert into tt2 values (generate_series(1,100000),'hello pg fans');
INSERT 0 100000
dbtest@[local:/tmp]:1992=#113927 create index concurrently idx_tt1_name on tt1 (name);
CREATE INDEX
dbtest@[local:/tmp]:1992=#113927 create index concurrently idx_tt2_id on tt2 (id);
CREATE INDEX
dbtest@[local:/tmp]:1992=#113927 explain analyze select * from tt1, tt2 where tt1.pid = tt2.id and tt1.name = 'hello mysql'; QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.44..12.49 rows=1 width=35) (actual time=0.007..0.007 rows=0 loops=1)
-> Index Scan using idx_tt1_name on tt1 (cost=0.15..4.17 rows=1 width=17) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: ((name)::text = 'hello mysql'::text)
-> Index Scan using idx_tt2_id on tt2 (cost=0.29..8.31 rows=1 width=18) (never executed)
Index Cond: (id = tt1.pid)
Planning Time: 0.262 ms
Execution Time: 0.030 ms
(7 rows)
Merge Join 連接:一般發生在連接條件是需要進行排序的連接(顯示指定order by ),或者是連接條件是索引(默認排序)的情況,
2個表可以同時 parallel 進行掃描,然后進行順序連接
情況1: 觸發索引排序
dbtest@[local:/tmp]:1992=#113927 create index concurrently idx_id_tt1 on tt1(id);
CREATE INDEX
dbtest@[local:/tmp]:1992=#113927 create index concurrently idx_id_tt2 on tt2(id);
CREATE INDEX
dbtest@[local:/tmp]:1992=#113927 explain analyze select * from tt1, tt2 where tt1.id = tt2.id ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
-----
Merge Join (cost=0.66..94.79 rows=1000 width=35) (actual time=0.019..0.595 rows=1000 loops=1)
Merge Cond: (tt1.id = tt2.id)
-> Index Scan using idx_id_tt1 on tt1 (cost=0.28..45.27 rows=1000 width=17) (actual time=0.006..0.147 rows=1000 loops=1)
-> Index Scan using idx_id_tt2 on tt2 (cost=0.29..3244.29 rows=100000 width=18) (actual time=0.007..0.188 rows=1001 loop
s=1)
Planning Time: 0.308 ms
Execution Time: 0.690 ms
(6 rows)
情況2: 沒有索引,顯示 order by 語句觸發
這里我們需要先把 enable_hashjoin 關閉掉 set enable_hashjoin = off;
dbtest@[local:/tmp]:1992=#113927 set enable_hashjoin = off;
SET
dbtest@[local:/tmp]:1992=#113927 explain analyze select * from tt1, tt2 where tt1.id = tt2.id order by tt1.name desc, tt2.name desc;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Sort (cost=11307.98..11310.48 rows=1000 width=35) (actual time=25.473..25.574 rows=1000 loops=1)
Sort Key: tt1.name DESC, tt2.name DESC
Sort Method: quicksort Memory: 103kB
-> Merge Join (cost=11262.73..11282.98 rows=1000 width=35) (actual time=24.525..25.155 rows=1000 loops=1)
Merge Cond: (tt2.id = tt1.id)
-> Sort (cost=11992.82..12242.82 rows=100000 width=18) (actual time=24.242..24.400 rows=1001 loops=1)
Sort Key: tt2.id
Sort Method: external merge Disk: 2752kB
-> Seq Scan on tt2 (cost=0.00..1637.00 rows=100000 width=18) (actual time=0.017..8.195 rows=100000 loops=1)
-> Sort (cost=66.83..69.33 rows=1000 width=17) (actual time=0.273..0.338 rows=1000 loops=1)
Sort Key: tt1.id
Sort Method: quicksort Memory: 87kB
-> Seq Scan on tt1 (cost=0.00..17.00 rows=1000 width=17) (actual time=0.015..0.149 rows=1000 loops=1)
Planning Time: 0.110 ms
Execution Time: 26.113 ms
(15 rows)
Hash Join 連接 : 熟悉oracle 的朋友們 對其應該是十分熟悉, 對于 mysql 數據庫的用戶來說 則是 羨慕. 嫉妒 恨 (mysql 8.0.18 版本已在開始支持 hash join,但是朋友圈的伙伴們大多是還是5.7的版本居多)
HTAP 混動數據庫的最基本的標配之一。 觸發條件你一定很熟悉:等值連接,小表(較小表)作為驅動表,生成HASH 散列表 (內存或者磁盤中)與大表進行連接,適合2張大表進行連接。適合OLTP 的的分析場景
我們可以看到:tt1 作為相對較小的表 在內存 ( Memory Usage: 59kB)中 生成了 1024 hash 桶
dbtest@[local:/tmp]:1992=#113927 explain analyze select * from tt1, tt2 where tt1.id = tt2.id ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Hash Join (cost=29.50..2051.50 rows=1000 width=35) (actual time=0.241..18.570 rows=1000 loops=1)
Hash Cond: (tt2.id = tt1.id)
-> Seq Scan on tt2 (cost=0.00..1637.00 rows=100000 width=18) (actual time=0.013..7.489 rows=100000 loops=1)
-> Hash (cost=17.00..17.00 rows=1000 width=17) (actual time=0.216..0.218 rows=1000 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 59kB
-> Seq Scan on tt1 (cost=0.00..17.00 rows=1000 width=17) (actual time=0.012..0.105 rows=1000 loops=1)
Planning Time: 0.148 ms
Execution Time: 18.657 ms
(8 rows)
最后大家分享一個執行計劃可視化的網站: https://explain.depesz.com/
把 explain 出來的文本,復制粘貼到網站中,點擊submit 即可得到表格化的圖形輸出。

Have a fun ?? !




