
大家好,今天和大家聊聊用PG的分區(qū)代替ORACLE的可能性。
眾做周知,ORACLE 作為業(yè)界數(shù)據(jù)庫(kù)的旗艦級(jí)產(chǎn)品,各項(xiàng)功能都是完善的, 近些年在國(guó)內(nèi)某種大潮下,國(guó)產(chǎn)數(shù)據(jù)庫(kù)或者開(kāi)源數(shù)據(jù)庫(kù)開(kāi)始提供一些替代
ORACLE的解決方案。 目前OLTP這塊, 開(kāi)源的MYSQL和PG做為了2個(gè)主力根的分支。 國(guó)產(chǎn)數(shù)據(jù)庫(kù)大部分都是基于這2個(gè)根分支進(jìn)行了研發(fā)或者魔改。
國(guó)內(nèi)早期的許多項(xiàng)目(非互聯(lián)網(wǎng)場(chǎng)景,大多數(shù)為傳統(tǒng)行業(yè)的項(xiàng)目),嚴(yán)重依賴于ORACLE 數(shù)據(jù)庫(kù)的分區(qū),并行查詢,存儲(chǔ)過(guò)程,hint, 強(qiáng)力的優(yōu)化器功能(上千行的復(fù)雜大SQL,幾十張表的join )等等等… 使得再去O 的道路上困難重重。
作為重度依賴oracle 的項(xiàng)目對(duì)于 數(shù)據(jù)庫(kù)選型來(lái)說(shuō) :
mysql 這種小快靈的開(kāi)發(fā)部署方式(devops),以及業(yè)界默認(rèn)的規(guī)則 不要寫(xiě)復(fù)雜的SQL(表連接不要超過(guò)3張),不要分區(qū),短事務(wù) 等等,似乎對(duì)于老項(xiàng)目的改造實(shí)現(xiàn)起來(lái)很難。
(程序員往往就是喜歡照著翻譯,上萬(wàn)行的PLSQL 翻譯成 短小的mysql 代碼 放到JAVA程序中 基本上無(wú)任何希望。。。)
PG 雖然在國(guó)外有著幾十年的歷史,但是與兄弟數(shù)據(jù)庫(kù)mysql比起來(lái), 國(guó)內(nèi)起步較晚,國(guó)內(nèi)程序員接受程度普遍不是很高。
在國(guó)外來(lái)說(shuō),PG已經(jīng)成為了最受開(kāi)發(fā)者歡迎的數(shù)據(jù)庫(kù)之一。
言歸正傳,我們首先看看PG的分區(qū)歷史:
PG10 之前的版本, PG使用繼承的方式來(lái)實(shí)現(xiàn)分區(qū)表的
從PG版本10 開(kāi)始, PG可以使用聲明式的分區(qū)定義,支持range, list 等主流分區(qū)的方式
從PG版本11開(kāi)始, PG 支持hash key 的分區(qū)
我們先從PG基礎(chǔ)的分區(qū)的定義來(lái)了解一下(本文分享的都是聲明式的分區(qū)定義方式):
Range 范圍分區(qū)
postgres=# create table payment_request (
id serial ,
account_no varchar(50) not null,
pay_amount decimal (6,2),
pay_date date,
pay_status int) partition by range(pay_date);
CREATE TABLE
由于PG沒(méi)有全局索引的概念,所以主鍵或者唯一鍵必須要包含分區(qū)間,否則會(huì)報(bào)錯(cuò): ERROR: unique constraint on partitioned table must include all partitioning columns(這一點(diǎn)和mysql 的分區(qū)表是一致的,ORACLE支持更為強(qiáng)大的全局索引)
postgres=# alter table payment_request add constraint pk_id_paydate primary key (id,pay_date);
ALTER TABLE
表payment_request 只是一個(gè)虛擬表的概念,我們需要添加具體的范圍分區(qū):
這里值得注意的是,對(duì)于范圍分區(qū)來(lái)說(shuō),如果人為的指定范圍錯(cuò)誤,那么創(chuàng)建分區(qū)的時(shí)候也會(huì)報(bào)錯(cuò): ERROR: partition “xxxx” would overlap partition “xxxxxxxxxx”
postgres=# create table payment_request_y2022m08 partition of payment_request
postgres-# for values from ('2022-08-01') to ('2022-09-1');
CREATE TABLE
postgres=# create table peyment_request_y2022m09 partition of payment_request
postgres-# for values from ('2022-08-20') to ('2022-09-30');
ERROR: partition "peyment_request_y2022m09" would overlap partition "payment_request_y2022m08"
LINE 2: for values from ('2022-08-20') to ('2022-09-30');
正確指定范圍,手動(dòng)創(chuàng)建分區(qū): 2022-09 和 2022-09 2 個(gè)分區(qū), 并在2個(gè)分區(qū)內(nèi)插入數(shù)據(jù)
postgres=# create table payment_request_y2022m08 partition of payment_request
postgres-# for values from ('2022-08-01') to ('2022-09-1');
CREATE TABLE
postgres=# create table payment_request_y2022m09 partition of payment_request
postgres-# for values from ('2022-09-01') to ('2022-10-01');
CREATE TABLE
postgres=# insert into payment_request (account_no,pay_amount,pay_date,pay_status) values ('10000002',3500,'2022-09-01',1);
INSERT 0 1
postgres=# insert into payment_request (account_no,pay_amount,pay_date,pay_status) values ('10000001',2500,'2022-08-01',1);
INSERT 0 1
postgres=# select * from payment_request_y2022m08;
id | account_no | pay_amount | pay_date | pay_status
----+------------+------------+------------+------------
4 | 10000001 | 2500.00 | 2022-08-01 | 1
(1 row)
postgres=# select * from payment_request_y2022m09;
id | account_no | pay_amount | pay_date | pay_status
----+------------+------------+------------+------------
3 | 10000002 | 3500.00 | 2022-09-01 | 1
(1 row)
從PG11版本開(kāi)始, 數(shù)據(jù)如果要在分區(qū)之間進(jìn)行移動(dòng)的話 ,比如執(zhí)行 update 語(yǔ)句, 默認(rèn)是 支持 enable row movement的。 (Oracle 數(shù)據(jù)庫(kù)的話,需要手動(dòng)開(kāi)啟表級(jí)別的 enable row movement)
早期PG10的版本,跨分區(qū)之間移動(dòng)數(shù)據(jù)的話,會(huì)進(jìn)行 分區(qū)約束的校驗(yàn) partition constraint : 會(huì)返回錯(cuò)誤 : ERROR: new row for relation "xxx " violates partition constraint
postgres=# update payment_request set pay_date = '2022-08-15' where pay_date = '2022-09-01';
UPDATE 1
postgres=# select * from payment_request_y2022m08;
id | account_no | pay_amount | pay_date | pay_status
----+------------+------------+------------+------------
4 | 10000001 | 2500.00 | 2022-08-01 | 1
3 | 10000002 | 3500.00 | 2022-08-15 | 1
(2 rows)
postgres=# select * from payment_request_y2022m09;
id | account_no | pay_amount | pay_date | pay_status
----+------------+------------+----------+------------
(0 rows)
如何查看分區(qū)表, 相對(duì)應(yīng)的子分區(qū)(分區(qū)索引)以及大小?
postgres=# SELECT
postgres-# nmsp_parent.nspname AS parent_schema,
postgres-# parent.relname AS parent,
postgres-# nmsp_child.nspname AS child_schema,
postgres-# child.relname AS child,
postgres-# case
postgres-# when child.relkind = 'r' then 'ordinary table'
postgres-# when child.relkind = 'i' then 'index'
postgres-# when child.relkind = 'S' then 'sequence'
postgres-# when child.relkind = 'v' then 'view'
postgres-# when child.relkind = 'm' then 'materialized view'
postgres-# when child.relkind = 'c' then 'composite type'
postgres-# when child.relkind = 't' then 'TOAST table'
postgres-# when child.relkind = 'f' then 'foreign table'
postgres-# end as child_type,
postgres-# pg_size_pretty(pg_relation_size(child.relname :: varchar)) AS child_size
postgres-# FROM pg_inherits
postgres-# JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
postgres-# JOIN pg_class child ON pg_inherits.inhrelid = child.oid
postgres-# JOIN pg_namespace nmsp_parent ON nmsp_parent.oid = parent.relnamespace
postgres-# JOIN pg_namespace nmsp_child ON nmsp_child.oid = child.relnamespace
postgres-# WHERE parent.relname in ('payment_request','pk_id_paydate') order by parent.relname;
parent_schema | parent | child_schema | child | child_type | child_size
---------------+-----------------+--------------+-------------------------------+----------------+------------
public | payment_request | public | payment_request_y2022m08 | ordinary table | 5888 kB
public | payment_request | public | payment_request_y2022m09 | ordinary table | 592 kB
public | pk_id_paydate | public | payment_request_y2022m08_pkey | index | 2208 kB
public | pk_id_paydate | public | payment_request_y2022m09_pkey | index | 240 kB
(4 rows)
我們還可以利用函數(shù) pg_partition_tree 查看表與分區(qū)之間的關(guān)系:
postgres=# select * from pg_partition_tree('payment_request');
relid | parentrelid | isleaf | level
--------------------------+-----------------+--------+-------
payment_request | | f | 0
payment_request_y2022m09 | payment_request | t | 1
payment_request_y2022m08 | payment_request | t | 1
(3 rows)
函數(shù) pg_partition_root 可以通過(guò)子分區(qū)的名稱,查找到root 表的名稱:
postgres=# select * from pg_partition_root('payment_request_y2022m09');
pg_partition_root
-------------------
payment_request
(1 row)
List 分區(qū)
列表分區(qū)也是我們常見(jiàn)的一種分區(qū)方式,通常作用于 NDV 值很小并且是預(yù)先固定的常量值的屬性類,類似于 城市, 商品類型,訂單狀態(tài)。
我們把之前的付款表加上一列 region 這個(gè)區(qū)域?qū)傩缘牧校?/p>
create table payment_request_list (
id serial ,
account_no varchar(50) not null,
pay_amount decimal (6,2),
pay_date date,
pay_status int,
region varchar(50)
) partition by list(region);
CREATE TABLE
創(chuàng)建4個(gè)分區(qū): 根據(jù)區(qū)域的 東南西北
postgres=# create table payment_request_list_east partition of payment_request_list for values
in ('east');
CREATE TABLE
postgres=# create table payment_request_list_west partition of payment_request_list for values
in ('west');
CREATE TABLE
postgres=# create table payment_request_list_north partition of payment_request_list for values
in ('north');
CREATE TABLE
postgres=# create table payment_request_list_south partition of payment_request_list for values
in ('south');
CREATE TABLE
postgres=# \d+ payment_request_list;
Partitioned table "public.payment_request_list"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
------------+-----------------------+-----------+----------+--------------------------------------------------+----------+-------------+--------------+-------------
id | integer | | not null | nextval('payment_request_list_id_seq'::regclass) | plain | | |
account_no | character varying(50) | | not null | | extended | | |
pay_amount | numeric(6,2) | | | | main | | |
pay_date | date | | | | plain | | |
pay_status | integer | | | | plain | | |
region | character varying(50) | | | | extended | | |
Partition key: LIST (region)
Partitions: payment_request_list_east FOR VALUES IN ('east'),
payment_request_list_north FOR VALUES IN ('north'),
payment_request_list_south FOR VALUES IN ('south'),
payment_request_list_west FOR VALUES IN ('west')
我們嘗試插入一條在 list 列表之外的記錄: 會(huì)返回錯(cuò)誤 no partition of relation “payment_request_list” found for row
postgres=# insert into payment_request_list (account_no,pay_amount,pay_date,pay_status,region) values ('1000005',1500,'2022-09-27',0,'center');
ERROR: no partition of relation "payment_request_list" found for row
DETAIL: Partition key of the failing row contains (region) = (center).
當(dāng)然我們可以手動(dòng)添加一個(gè) list = ‘center’ 的分區(qū) 。
或者我們也可以添加一個(gè) default 分區(qū) ,default partition 可以收納 分區(qū)鍵之前的所有值 。
(PG version 11 開(kāi)始支持 default 分區(qū),list 和 range 均可以添加 default 分區(qū))
postgres=# create table payment_request_list_default partition of payment_request_list default;
CREATE TABLE
postgres=# insert into payment_request_list (account_no,pay_amount,pay_date,pay_status,region) values ('1000005',1500,'2022-09-27',0,'center');
INSERT 0 1
postgres=# \d+ payment_request_list
Partitioned table "public.payment_request_list"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
------------+-----------------------+-----------+----------+--------------------------------------------------+----------+-------------+--------------+-------------
id | integer | | not null | nextval('payment_request_list_id_seq'::regclass) | plain | | |
account_no | character varying(50) | | not null | | extended | | |
pay_amount | numeric(6,2) | | | | main | | |
pay_date | date | | | | plain | | |
pay_status | integer | | | | plain | | |
region | character varying(50) | | | | extended | | |
Partition key: LIST (region)
Partitions: payment_request_list_east FOR VALUES IN ('east'),
payment_request_list_north FOR VALUES IN ('north'),
payment_request_list_south FOR VALUES IN ('south'),
payment_request_list_west FOR VALUES IN ('west'),
payment_request_list_default DEFAULT
hash 哈希分區(qū)
哈希分區(qū)是PG version 11 版本開(kāi)始支持的, 需要哈希的列 必須要數(shù)據(jù)分布均勻,不要有傾斜,否則失去了哈希打散數(shù)據(jù)的意義,數(shù)據(jù)熱點(diǎn)無(wú)法分散。
我們還是以 payment_request 作為案例, ID 作為hash key , 具有唯一性,使得數(shù)據(jù)分布均勻。
postgres=# create table payment_request_hash (
id serial ,
account_no varchar(50) not null,
pay_amount decimal (6,2),
pay_date date,
pay_status int,
region varchar(50)
) partition by hash(id);
CREATE TABLE
手動(dòng)創(chuàng)建4個(gè)分區(qū), 利用 mode 函數(shù) 除以4取余
postgres=# create table payment_request_hash_01 partition of payment_request_hash for values with (modulus 4, remainder 0);
CREATE TABLE
postgres=# create table payment_request_hash_02 partition of payment_request_hash for values with (modulus 4, remainder 1);
CREATE TABLE
postgres=# create table payment_request_hash_03 partition of payment_request_hash for values with (modulus 4, remainder 2);
CREATE TABLE
postgres=# create table payment_request_hash_04 partition of payment_request_hash for values with (modulus 4, remainder 3);
CREATE TABLE
postgres=# \d+ payment_request_hash
Partitioned table "public.payment_request_hash"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
------------+-----------------------+-----------+----------+--------------------------------------------------+----------+-------------+--------------+-------------
id | integer | | not null | nextval('payment_request_hash_id_seq'::regclass) | plain | | |
account_no | character varying(50) | | not null | | extended | | |
pay_amount | numeric(6,2) | | | | main | | |
pay_date | date | | | | plain | | |
pay_status | integer | | | | plain | | |
region | character varying(50) | | | | extended | | |
Partition key: HASH (id)
Partitions: payment_request_hash_01 FOR VALUES WITH (modulus 4, remainder 0),
payment_request_hash_02 FOR VALUES WITH (modulus 4, remainder 1),
payment_request_hash_03 FOR VALUES WITH (modulus 4, remainder 2),
payment_request_hash_04 FOR VALUES WITH (modulus 4, remainder 3)
模擬插入10000條數(shù)據(jù): 查看數(shù)據(jù)分布的情況是很均勻的
postgres=# insert into payment_request_hash select generate_series(1,10000),'100000010',2000,'2022-09-27',1,'east';
INSERT 0 10000
postgres=# select count(1) from payment_request_hash_01;
count
-------
2489
(1 row)
postgres=# select count(1) from payment_request_hash_02;
count
-------
2527
(1 row)
postgres=# select count(1) from payment_request_hash_03;
count
-------
2530
(1 row)
postgres=# select count(1) from payment_request_hash_04;
count
-------
2454
(1 row)
組合分區(qū)
接下來(lái)再看看常用的二級(jí)分區(qū): List + range 的形式 (一級(jí) partition 是 List (region 列), subpartition 是 range (request_date))
PG 原生的語(yǔ)法并不支持直接創(chuàng)建二級(jí)分區(qū)表(國(guó)外商業(yè)版的 EDB postgres 支持),我們可以通過(guò)多次創(chuàng)建分區(qū)表的方式來(lái)間接實(shí)現(xiàn)2級(jí)分區(qū)的功能:
1.創(chuàng)建一級(jí)分區(qū)的定義 – partition by list
postgres=# create table payment_request_compose_list_range partition of payment_request_compose_list for values in ('east') partition by range(pay_date);
CREATE TABLE
2.創(chuàng)建二級(jí)分區(qū)的定義 – partition by range
postgres=# create table payment_request_compose_west_range partition of payment_request_compose_list for values in ('west') partition by range(pay_date);
CREATE TABLE
postgres=# create table payment_request_compose_east_range partition of payment_request_compose_list for values in ('east') partition by range(pay_date);
CREATE TABLE
3.創(chuàng)建具體的二級(jí)分區(qū)表
postgres=# create table payment_request_compose_west_range202208 partition of payment_request_compose_west_range for values from ('2022-08-01') to ('2022-09-01');
CREATE TABLE
postgres=# create table payment_request_compose_west_range202209 partition of payment_request_compose_west_range for values from ('2022-09-01') to ('2022-10-01');
CREATE TABLE
postgres=# create table payment_request_compose_east_range202209 partition of payment_request_compose_east_range for values from ('2022-09-01') to ('2022-10-01');
CREATE TABLE
postgres=# create table payment_request_compose_east_range202208 partition of payment_request_compose_east_range for values from ('2022-08-01') to ('2022-09-01');
CREATE TABLE
4.查詢二級(jí)分區(qū)表之間的關(guān)系,我們可以利用函數(shù) pg_partition_tree , 可以看到 level 0 是一級(jí)分區(qū)表 list 的定義, level1 是 二級(jí)分區(qū)表 range的定義 , level 2 是具體的二級(jí)分區(qū)表
postgres=# select * from pg_partition_tree('payment_request_compose_list');
relid | parentrelid | isleaf | level
------------------------------------------+------------------------------------+--------+-------
payment_request_compose_list | | f | 0
payment_request_compose_west_range | payment_request_compose_list | f | 1
payment_request_compose_east_range | payment_request_compose_list | f | 1
payment_request_compose_west_range202208 | payment_request_compose_west_range | t | 2
payment_request_compose_west_range202209 | payment_request_compose_west_range | t | 2
payment_request_compose_east_range202209 | payment_request_compose_east_range | t | 2
payment_request_compose_east_range202208 | payment_request_compose_east_range | t | 2
(7 rows)
分區(qū)的維護(hù)性
原生的PG分區(qū)的維護(hù)性和ORACLE老大哥比起來(lái)還是存在一定差異的。
a)添加,刪除,truncate 分區(qū)的命令,直接作用于具體的分區(qū)表中, 由于沒(méi)有ORACLE的全局索引的概念, 也就沒(méi)有了全局索引 rebuild 的煩惱.
添加
postgres=# create table payment_request_y2022m10 partition of payment_request for
postgres-# values from ('2022-10-01') to ('2022-11-01');
CREATE TABLE
刪除:
postgres=# drop table payment_request_y2022m10;
DROP TABLE
truncate:
postgres=# truncate table payment_request_y2022m10;
TRUNCATE TABLE
b)PG的交換分區(qū)命令(原生的PG不支持ORACLE的 exchange partition的命令): ATTACH 和 DETACH 分區(qū)
我們用 ATTACH,DETACH 模擬一下 類似exchange partition 的功能:
我們歸檔一下表 payment_request 的 8月份的分區(qū) :
1)detach partition & create new partition
postgres=# alter table payment_request detach partition payment_request_y2022m08;
ALTER TABLE
postgres=# create table payment_request_y2022m08_arch partition of payment_request
postgres-# for values from ('2022-08-01') to ('2022-09-1');
CREATE TABLE
2)detach partition & attach partition
postgres=# create table payment_request_y2022m08 (LIKE payment_request INCLUDING DEFAULTS INCLUDING CONSTRAINTS);
CREATE TABLE
postgres=# alter table payment_request detach partition payment_request_y2022m08_arch;
ALTER TABLE
postgres=# alter table payment_request attach partition payment_request_y2022m08 for values from ('2022-08-01') to ('2022-09-01');
ALTER TABLE
c)如何split 拆分分區(qū), 原生的PG是沒(méi)有split 的 語(yǔ)法的(國(guó)外的EDB的企業(yè)版有支持), 我們只能用如下的步驟來(lái)代替:
我們把一個(gè)月的分區(qū),一分為二的步驟如下: 當(dāng)然生產(chǎn)環(huán)境的話,如果這個(gè)分區(qū)不是冷數(shù)據(jù)(靜態(tài)的數(shù)據(jù))的話,是需要應(yīng)用系統(tǒng)有 outage 的,來(lái)確保的完整性
1)Detach the partition
postgres=# alter table payment_request detach partition payment_request_y2022m09;
ALTER TABLE
2)create new partition
postgres=# create table payment_request_y2022m09_half1 partition of payment_request for values from ('2022-09-01') to ('2022-09-15');
CREATE TABLE
postgres=# create table payment_request_y2022m09_half2 partition of payment_request for values from ('2022-09-15') to ('2022-10-01');
CREATE TABLE
3)load the data into new partition
postgres=# insert into payment_request_y2022m09_half1 select * from payment_request_y2022m09 where pay_date <= '2022-09-15';
INSERT 0 0
postgres=# insert into payment_request_y2022m09_half2 select * from payment_request_y2022m09 where pay_date > '2022-09-15';
INSERT 0 9991
PG 分區(qū)的好處大致是:
1)數(shù)據(jù)的維護(hù),特別針對(duì)數(shù)據(jù)的刪除delete操作,避免產(chǎn)生大量的日志文件(WAL),避免大量死元祖的產(chǎn)生(表膨脹),避免頻繁觸發(fā)數(shù)據(jù)庫(kù) auto vacuum/vacuum的 backgroud 進(jìn)程
2)性能上的優(yōu)化:大家常說(shuō)的 partition purning (分區(qū)消除),大表轉(zhuǎn)化為小表的思想
3)并行查詢
4)避免單表大小32TB的限制
分區(qū)的限制和缺陷:
1)分區(qū)鍵對(duì)于業(yè)務(wù)系統(tǒng)(業(yè)務(wù)代碼)的侵入性,或者說(shuō) 只有在特定的業(yè)務(wù)下選擇業(yè)務(wù)分區(qū)鍵,才能最有效發(fā)揮分區(qū)的優(yōu)勢(shì)
2)PG目前無(wú)interval 類型的自動(dòng)擴(kuò)展分區(qū),需要自行編寫(xiě)腳本來(lái)維護(hù)和監(jiān)控。
3) 不支持全局索引,應(yīng)用端之前如果有依賴全局唯一索引做校驗(yàn)的話保證數(shù)據(jù)的完整性的話,需要做代碼改動(dòng)
4)對(duì)于非分區(qū)表到分區(qū)表的轉(zhuǎn)換, 由于目前PG不支持類似ORACLE基于MV LOG的在線重定義, 需要編寫(xiě)觸發(fā)器同步數(shù)據(jù)的方案或者干脆選擇系統(tǒng)維護(hù)窗口來(lái)實(shí)現(xiàn)CTAS
最后, 總結(jié)一下 ORACLE 分區(qū)表和PG分區(qū)表的功能對(duì)照表 (來(lái)自于 AWS 的官方,個(gè)人簡(jiǎn)單的翻譯了一下)

Have a fun! ??




