PG 之 SQL執行計劃

原創大表哥 2022-08-02

2406

大家好，今天和大家分享是的PG的執行計劃相關的方面內容。

和眾所周知的數據庫ORACLE，MYSQL 一樣，PG 的優化器也是基于CBO的成本計算，來生成理論上是最佳的執行計劃。

不同的數據庫，同樣的 explain 命令，給你帶來執行計劃的詳細輸出：

dbtest@[local:/tmp]:1992=#105846 create table tab (id int , name varchar(200));
CREATE TABLE

dbtest@[local:/tmp]:1992=#105846 insert into tab values (generate_series(1,10000),'PG execution plan');
INSERT 0 10000

dbtest@[local:/tmp]:1992=#105846 \timing
Timing is on.

dbtest@[local:/tmp]:1992=#105846  explain select * from tab;
                        QUERY PLAN                        
----------------------------------------------------------
Seq Scan on tab  (cost=0.00..164.00 rows=10000 width=22)
(1 row)


Time: 0.566 ms

explain 命令后面可以跟隨不同的參數，含義如下：

EXPLAIN [ ( option [, ...] ) ] statement

ANALYZE [ boolean ] : 通過實際執行SQL 來獲得真實的執行計劃，每一步返回的耗時時間和行數都是真實的

dbtest@[local:/tmp]:1992=#105846  explain analyze select * from tab;
                                               QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
Seq Scan on tab  (cost=0.00..164.00 rows=10000 width=22) (actual time=0.008..0.565 rows=10000 loops=1)
Planning Time: 0.035 ms
Execution Time: 0.870 ms
(3 rows)


Time: 1.221 ms

VERBOSE [ boolean ]：輸出更為詳細的信息：比如 Query Identifier 這個重要的屬性類似于mysql 的SQL digest 或者是 oracle 的SQL_ID
這個 Query Identifier 與 pg_stat_statements 插件里面是一樣的

dbtest@[local:/tmp]:1992=#105846  explain analyze verbose select * from tab;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
Seq Scan on public.tab  (cost=0.00..164.00 rows=10000 width=22) (actual time=0.009..0.731 rows=10000 loops=1)
   Output: id, name
Query Identifier: 4997534032644374154
Planning Time: 0.038 ms
Execution Time: 1.123 ms
(5 rows)


Time: 1.499 ms

COSTS [ boolean ]：顯示 cost 成本，這個默認就是打開的
BUFFERS [ boolean ]：顯示內存以及磁盤的讀寫情況， Buffers: shared hit=64 表示內存中的 64個 page 全部命中，直接從磁盤查出數據
（select pg_size_pretty(pg_relation_size(‘tab’)); 512 kB/8kB = 64 pages ）

dbtest@[local:/tmp]:1992=#105846  explain (analyze  true , verbose  true ,buffers true ) select * from tab;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
Seq Scan on public.tab  (cost=0.00..164.00 rows=10000 width=22) (actual time=0.017..2.605 rows=10000 loops=1)
   Output: id, name
   Buffers: shared hit=64
Query Identifier: 4997534032644374154
Planning Time: 0.087 ms
Execution Time: 4.117 ms
(6 rows)


Time: 4.695 ms

dbtest@[local:/tmp]:1992=#105846 select pg_size_pretty(pg_relation_size('tab'));
pg_size_pretty
----------------
512 kB
(1 row)


Time: 0.315 ms

WAL [ boolean ]：對WAL 日志寫入信息的統計。一般與analyze 聯合使用，達到SQL真實運下，WAL的信息準確性的目的。這個參數是在PG 13版本引入的。

我們測試一下，插入100萬的數據產生的WAL 日志的大?。? WAL: records=1000000 bytes=69000000 大致是65M

dbtest@[local:/tmp]:1992=#113927 select 69000000/1024/1024 as "WAL size(MB)";
WAL size(MB)
--------------
           65
(1 row)

dbtest@[local:/tmp]:1992=#113927 create table tab2(id int, name varchar(200));
CREATE TABLE
dbtest@[local:/tmp]:1992=#113927 explain (analyze,wal) insert into tab2 values (generate_series(1,1000000),'hello PG!');
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
Insert on tab2  (cost=0.00..5000.02 rows=0 width=0) (actual time=939.011..939.012 rows=0 loops=1)
   WAL: records=1000000 bytes=69000000
   ->  ProjectSet  (cost=0.00..5000.02 rows=1000000 width=422) (actual time=0.003..93.902 rows=1000000 loops=1)
         ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
Planning Time: 0.046 ms
Execution Time: 939.047 ms
(6 rows)

FORMAT { TEXT | XML | JSON | YAML }：嘗試輸出格式的多樣性, 支持 YAML 這個格式確實是有點驚艷

dbtest@[local:/tmp]:1992=#105846  explain (analyze  true , verbose  true ,buffers true, format YAML ) select * from tab;
               QUERY PLAN                
-----------------------------------------
- Plan:                                +
     Node Type: "Seq Scan"              +
     Parallel Aware: false              +
     Async Capable: false               +
     Relation Name: "tab"               +
     Schema: "public"                   +
     Alias: "tab"                       +
     Startup Cost: 0.00                 +
     Total Cost: 228.00                 +
     Plan Rows: 10000                   +
     Plan Width: 17                     +
     Actual Startup Time: 0.054         +
     Actual Total Time: 0.862           +
     Actual Rows: 10000                 +
     Actual Loops: 1                    +
     Output:                            +
       - "id"                           +
       - "name"                         +

   Execution Time: 1.392
(1 row)

上面是簡單的介紹了一下 explain 的選項參數，我們接下來看一下輸出的信息含義

cost 成本分為：起始成本和總成本 cost=0.00…164.00

返回的行數： rows = 10000
width : 返回的列的寬度 width=17

actual time=0.073…1.135 SQL解析的時間： 0.073， SQL 的總時間：1.135
實際返回的行數： rows=10000
循環的次數： loops=1 單表查詢所以循環的次數是 1

dbtest@[local:/tmp]:1992=#121533 explain analyze select * from tab ;
                                               QUERY PLAN                                               
--------------------------------------------------------------------------------------------------------
Seq Scan on tab  (cost=0.00..164.00 rows=10000 width=22) (actual time=0.008..0.641 rows=10000 loops=1)
Planning Time: 0.036 ms
Execution Time: 0.946 ms
(3 rows)

接下來我們看一下 cost 相關的成本是如何計算的？

計算公式來源于官方文檔： (disk pages read * seq_page_cost) + (rows scanned * cpu_tuple_cost).

SELECT relpages, reltuples FROM pg_class WHERE relname = ‘tab’; --得到 64個 page 和 10000 個元祖

seq_page_cost = 1 ， cpu_tuple_cost = 0.1

(disk pages read * seq_page_cost) + (rows scanned * cpu_tuple_cost) = 64 * 1 + 10000 * 0.01 = 164

和 (cost=0.00…164.00 rows=10000 width=22) 是相互吻合的

dbtest@[local:/tmp]:1992=#121533 SELECT relpages, reltuples FROM pg_class WHERE relname = 'tab';
relpages | reltuples
----------+-----------
       64 |     10000
(1 row)

dbtest@[local:/tmp]:1992=#121533 show seq_page_cost;
seq_page_cost
---------------
1
(1 row)

dbtest@[local:/tmp]:1992=#121533 show cpu_tuple_cost;
cpu_tuple_cost
----------------
0.01
(1 row)

如果是帶有 where 過濾條件的呢？成本的計算公式是在原有的cost 基礎之上 + Filter 的成本（cpu_operator_cost * rows）

cpu_operator_cost : 默認是 0.0025
rows : pg_class 表中的 reltuples 屬性是 10000
原有的cost ： (disk pages read * seq_page_cost) + (rows scanned * cpu_tuple_cost) = 64 * 1 + 10000 * 0.01 = 164
Filter 的成本是 cpu_operator_cost * rows = 10000 +* 0.0025 = 25

所以總的cost 是 164 + 25 = 189

dbtest@[local:/tmp]:1992=#113927 explain analyze select * from tab where name ~ 'test%';
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
Seq Scan on tab  (cost=0.00..189.00 rows=1 width=22) (actual time=10.776..10.777 rows=0 loops=1)
   Filter: ((name)::text ~ 'test%'::text)
   Rows Removed by Filter: 10000
Planning Time: 0.132 ms
Execution Time: 10.797 ms
(5 rows)



dbtest@[local:/tmp]:1992=#121533 show cpu_operator_cost
dbtest-# ;
cpu_operator_cost
-------------------
0.0025
(1 row)

上面我們簡單的了解一下 cost 是如何計算的，接下來我們看一下執行計劃中表的訪問方式和表之間的連接方式：

表的訪問方式：
1）Sequential Scan 全表掃描
2） Index Scan 索引掃描
3) Index Only Scan 覆蓋索引掃描
4) Bitmap Heap Scan 索引位圖掃描

表與表的連接方式：

Nested Loops 嵌套循環查詢連接
Merge Join 連接
3)Hash Join 連接

Sequential Scan 全表掃描，一般是發生在沒有可能觸發索引（或者是索引選擇率很差）的情況下，
一般適合超級小表，或者在OLAP 分析場景下，需要掃描大量數據

執行計劃信息： Seq Scan on 表名

dbtest@[local:/tmp]:1992=#113927 explain analyze select * from tab2;
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
Seq Scan on tab2  (cost=0.00..15406.00 rows=1000000 width=14) (actual time=0.007..71.424 rows=1000000 loops=1)
Planning Time: 0.145 ms
Execution Time: 109.381 ms
(3 rows)

Index Scan 索引掃描，一般是發生在沒有可能觸發索引（或者是索引選擇率很高，一般在5%一下）的情況下，
適合OLTP 高并發場景，必須毫秒級別返回數據

dbtest@[local:/tmp]:1992=#113927 create index concurrently idx_name_tab2 on tab2(name);
CREATE INDEX

dbtest@[local:/tmp]:1992=#113927 create index concurrently idx_name_tab2 on tab2(name);
CREATE INDEX
dbtest@[local:/tmp]:1992=#113927 explain analyze select * from tab2 where name = 'jason' limit 10;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
Limit  (cost=0.42..4.44 rows=1 width=14) (actual time=0.015..0.015 rows=0 loops=1)
   ->  Index Scan using idx_name_tab2 on tab2  (cost=0.42..4.44 rows=1 width=14) (actual time=0.014..0.014 rows=0 loops=1)
         Index Cond: ((name)::text = 'jason'::text)
Planning Time: 0.182 ms
Execution Time: 0.030 ms
(5 rows)

Index Only Scan 覆蓋索引掃描，一般發生在 select 的列信息包含在索引之中。這里值得注意的是和MYSQL 不同， PG 由于特殊的MVCC機制，如果vacuum 不及時的話，
覆蓋索引依然會回表查詢來進行驗證。能否觸發真正的index only scan 還需要看 visibility map 中的 bit 位圖的信息。
visibility map這塊可以參考之前的文章： https://cdn.modb.pro/db/447177

執行計劃中 Heap Fetches: 0 表示沒有回表取數據，存在2種情況：
1）通過索引判斷真的沒有數據，所以不需要回表
2）通過索引判斷真的有數據，再一次根據VM 判斷，全部元祖是新的，所以從索引中就能獲得最新的數據，所以不需要回表 \

dbtest@[local:/tmp]:1992=#113927 explain analyze select name from tab2 where name = 'hello PG!!' limit 10;
                                                           QUERY PLAN                                                        
   
-----------------------------------------------------------------------------------------------------------------------------
---
Limit  (cost=0.42..4.44 rows=1 width=10) (actual time=0.028..0.029 rows=0 loops=1)
   ->  Index Only Scan using idx_name_tab2 on tab2  (cost=0.42..4.44 rows=1 width=10) (actual time=0.027..0.028 rows=0 loops=
1)
         Index Cond: (name = 'hello PG!!'::text)
         Heap Fetches: 0
Planning Time: 0.070 ms
Execution Time: 0.044 ms
(6 rows)

Bitmap Heap Scan 索引位圖掃描這個一般發生在觸發索引存在 or 條件的情況下，建立一張 bitmap 來尋找想要的記錄

dbtest@[local:/tmp]:1992=#113927 explain analyze select name from tab2 where name = 'hello oracle' or name = 'hello mysql' ;
                                                         QUERY PLAN                                                         
----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tab2  (cost=8.87..12.88 rows=1 width=10) (actual time=0.046..0.047 rows=0 loops=1)
   Recheck Cond: (((name)::text = 'hello oracle'::text) OR ((name)::text = 'hello mysql'::text))
   ->  BitmapOr  (cost=8.87..8.87 rows=1 width=0) (actual time=0.045..0.045 rows=0 loops=1)
         ->  Bitmap Index Scan on idx_name_tab2  (cost=0.00..4.43 rows=1 width=0) (actual time=0.035..0.035 rows=0 loops=1)
               Index Cond: ((name)::text = 'hello oracle'::text)
         ->  Bitmap Index Scan on idx_name_tab2  (cost=0.00..4.43 rows=1 width=0) (actual time=0.008..0.008 rows=0 loops=1)
               Index Cond: ((name)::text = 'hello mysql'::text)
Planning Time: 0.081 ms
Execution Time: 0.086 ms
(9 rows)

我們再看一下，表與表的連接方式：

Nested Loops 嵌套循環查詢連接基本上和ORACLE的 nested loop 是無差別的，適合大小表連接，小表作為驅動表，以觸發索引訪問的方式來訪問大表。
當然這里說的小表不一定是表本身就是小表，也有可能是經過索引過濾后的相對較小的結果集。

適合OLTP 場景，毫秒級返回少量數據

dbtest@[local:/tmp]:1992=#113927 create table tt1 (id int, name varchar(200), pid int);
CREATE TABLE
dbtest@[local:/tmp]:1992=#113927 create table tt2 (id int, name varchar(200));
CREATE TABLE
dbtest@[local:/tmp]:1992=#113927 insert into tt1 values (generate_series(1,1000),'hello pg',generate_series(1,1000));
INSERT 0 1000                                                                     
dbtest@[local:/tmp]:1992=#113927 insert into tt2 values (generate_series(1,100000),'hello pg fans');
INSERT 0 100000
dbtest@[local:/tmp]:1992=#113927 create index concurrently idx_tt1_name on tt1 (name);
CREATE INDEX
dbtest@[local:/tmp]:1992=#113927 create index concurrently idx_tt2_id  on tt2 (id);
CREATE INDEX
dbtest@[local:/tmp]:1992=#113927 explain  analyze select * from tt1, tt2 where tt1.pid = tt2.id and tt1.name = 'hello mysql';                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=0.44..12.49 rows=1 width=35) (actual time=0.007..0.007 rows=0 loops=1)
   ->  Index Scan using idx_tt1_name on tt1  (cost=0.15..4.17 rows=1 width=17) (actual time=0.006..0.006 rows=0 loops=1)
         Index Cond: ((name)::text = 'hello mysql'::text)
   ->  Index Scan using idx_tt2_id on tt2  (cost=0.29..8.31 rows=1 width=18) (never executed)
         Index Cond: (id = tt1.pid)
Planning Time: 0.262 ms
Execution Time: 0.030 ms
(7 rows)

Merge Join 連接：一般發生在連接條件是需要進行排序的連接（顯示指定order by ），或者是連接條件是索引（默認排序）的情況，
2個表可以同時 parallel 進行掃描，然后進行順序連接

情況1：觸發索引排序

dbtest@[local:/tmp]:1992=#113927 create index concurrently idx_id_tt1 on tt1(id);
CREATE INDEX
dbtest@[local:/tmp]:1992=#113927 create index concurrently idx_id_tt2 on tt2(id);
CREATE INDEX
dbtest@[local:/tmp]:1992=#113927 explain  analyze select *  from tt1, tt2 where tt1.id = tt2.id ;
                                                            QUERY PLAN                                                       
     
-----------------------------------------------------------------------------------------------------------------------------
-----
Merge Join  (cost=0.66..94.79 rows=1000 width=35) (actual time=0.019..0.595 rows=1000 loops=1)
   Merge Cond: (tt1.id = tt2.id)
   ->  Index Scan using idx_id_tt1 on tt1  (cost=0.28..45.27 rows=1000 width=17) (actual time=0.006..0.147 rows=1000 loops=1)
   ->  Index Scan using idx_id_tt2 on tt2  (cost=0.29..3244.29 rows=100000 width=18) (actual time=0.007..0.188 rows=1001 loop
s=1)
Planning Time: 0.308 ms
Execution Time: 0.690 ms
(6 rows)

情況2：沒有索引，顯示 order by 語句觸發

這里我們需要先把 enable_hashjoin 關閉掉 set enable_hashjoin = off;

dbtest@[local:/tmp]:1992=#113927 set enable_hashjoin = off;
SET

dbtest@[local:/tmp]:1992=#113927 explain  analyze select *  from tt1, tt2 where tt1.id = tt2.id order by tt1.name desc, tt2.name desc;
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
Sort  (cost=11307.98..11310.48 rows=1000 width=35) (actual time=25.473..25.574 rows=1000 loops=1)
   Sort Key: tt1.name DESC, tt2.name DESC
   Sort Method: quicksort  Memory: 103kB
   ->  Merge Join  (cost=11262.73..11282.98 rows=1000 width=35) (actual time=24.525..25.155 rows=1000 loops=1)
         Merge Cond: (tt2.id = tt1.id)
         ->  Sort  (cost=11992.82..12242.82 rows=100000 width=18) (actual time=24.242..24.400 rows=1001 loops=1)
               Sort Key: tt2.id
               Sort Method: external merge  Disk: 2752kB
               ->  Seq Scan on tt2  (cost=0.00..1637.00 rows=100000 width=18) (actual time=0.017..8.195 rows=100000 loops=1)
         ->  Sort  (cost=66.83..69.33 rows=1000 width=17) (actual time=0.273..0.338 rows=1000 loops=1)
               Sort Key: tt1.id
               Sort Method: quicksort  Memory: 87kB
               ->  Seq Scan on tt1  (cost=0.00..17.00 rows=1000 width=17) (actual time=0.015..0.149 rows=1000 loops=1)
Planning Time: 0.110 ms
Execution Time: 26.113 ms
(15 rows)

Hash Join 連接：熟悉oracle 的朋友們對其應該是十分熟悉，對于 mysql 數據庫的用戶來說則是羨慕. 嫉妒恨 (mysql 8.0.18 版本已在開始支持 hash join，但是朋友圈的伙伴們大多是還是5.7的版本居多)
HTAP 混動數據庫的最基本的標配之一。觸發條件你一定很熟悉：等值連接，小表（較小表）作為驅動表，生成HASH 散列表（內存或者磁盤中）與大表進行連接，適合2張大表進行連接。適合OLTP 的的分析場景

我們可以看到：tt1 作為相對較小的表在內存（ Memory Usage: 59kB）中生成了 1024 hash 桶

dbtest@[local:/tmp]:1992=#113927 explain  analyze select *  from tt1, tt2 where tt1.id = tt2.id ;
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
Hash Join  (cost=29.50..2051.50 rows=1000 width=35) (actual time=0.241..18.570 rows=1000 loops=1)
   Hash Cond: (tt2.id = tt1.id)
   ->  Seq Scan on tt2  (cost=0.00..1637.00 rows=100000 width=18) (actual time=0.013..7.489 rows=100000 loops=1)
   ->  Hash  (cost=17.00..17.00 rows=1000 width=17) (actual time=0.216..0.218 rows=1000 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 59kB
         ->  Seq Scan on tt1  (cost=0.00..17.00 rows=1000 width=17) (actual time=0.012..0.105 rows=1000 loops=1)
Planning Time: 0.148 ms
Execution Time: 18.657 ms
(8 rows)

最后大家分享一個執行計劃可視化的網站： https://explain.depesz.com/

把 explain 出來的文本，復制粘貼到網站中，點擊submit 即可得到表格化的圖形輸出。

Have a fun ?? !

「喜歡這篇文章，您的關注和贊賞是給作者最好的鼓勵」

關注作者

特色一级强游戏,海奥华预言免费阅读,51漫画兑换码,美女裸体无遮挡永久免费观看网站,lubuntu线路检测入口

PG 之 SQL執行計劃

評論