赞
踩
在PG中,磁盘存储和内存中的最小管理单位都是page,也是通常所说的block。一般PG页的大小为8K,在源码编译时可以设置。此后都不可更改,因为许多PG内存结构设计都是以此为基础的
PG块的定义文件为src/include/storage/bufpage.h
PAGE的结构如下:
参考网上整理之后的结构图如下:
Page Header的定义如下:
- lsn:记录最后一次对页修改的xlog记录
- checksum:页面的校验和,用于判断当前页是否完整
- flags:(指示当前页的状态)
- lower:本页空闲位置的起始指针
- uper:本业空闲位置的结束指针
- special:页预留的位置
- pagesize:页面大小
- version:当前版本
- prune_xid:最后一次删除或更新的xid
- 页后面存储的是元组(tuple)信息的数据(表的数据行,一个元组信息就是一行),也就是下一节的page item
ItemIdData的定义如下:
itemid.h头文件
参考网友整理图
ItemIdData之后之后是tuple,tuple的组成为tuple header+data
首先tuple header的定义文件为access/htup_details.h
其中HeapTupleFields的定义如下:
参考网上整理图如下:
tuple header之后就是data数据了。
1)、空间使用测试
首先确认您是否拥有了 pageinspect 这个 extension
如果没有,使用create extension pageinspect命令安装插件。
- postgres=# drop table if EXISTS test1;
- DROP TABLE
- postgres=# create table test1(id int);
- CREATE TABLE
- postgres=# insert into test1 values(1);
- INSERT 0 1
- postgres=# select * from page_header(get_raw_page('test1', 0));
- lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
- ------------+----------+-------+-------+-------+---------+----------+---------+-----------
- 2/DB128048 | 0 | 0 | 28 | 8160 | 8192 | 8192 | 4 | 0
- (1 row)
- postgres=# insert into test1 values(2);
- INSERT 0 1
- postgres=# select * from page_header(get_raw_page('test1', 0));
- lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
- ------------+----------+-------+-------+-------+---------+----------+---------+-----------
- 2/DB1741C0 | 0 | 0 | 32 | 8128 | 8192 | 8192 | 4 | 0
- (1 row)
通过上面可以看到,每插入一行数据,lower的值就会增加4(itemIdData),其中page header的大小为24,插入两行数据之后增加了两个ItemIdData。相应的upper地址会向上移动。
查询tuple header的
select * from heap_page_items(get_raw_page('test1',0));
- lp:这是插件自己定义的列,在源码中其实没有,这个是项指针的顺序
- lp_off: tuple在页面中的位置
- lp_len:tuple的实际长度(tuple header + data)
- t_xmin,t_xmax: 插入,删除和更新时的事物id,插入时xmin写入当前事物id,删除时xmax写入事物id。更新也是先删除再插入
- t_field3:
- t_ctid:物理id,或者关联更新之后的物理ID
2)、更新第二行数据
- postgres=# update test1 set id = 3 where id = 2;
- UPDATE 1
- postgres=# select * from page_header(get_raw_page('test1', 0));
- lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
- ------------+----------+-------+-------+-------+---------+----------+---------+-----------
- 2/DB174418 | 0 | 0 | 36 | 8096 | 8192 | 8192 | 4 | 2097527
- (1 row)
-
- postgres=# select * from heap_page_items(get_raw_page('test1',0));
- lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
- ----+--------+----------+--------+---------+---------+----------+--------+-------------+------------+--------+--------+-------+------------
- 1 | 8160 | 1 | 28 | 2097522 | 0 | 0 | (0,1) | 1 | 2304 | 24 | | | \x01000000
- 2 | 8128 | 1 | 28 | 2097526 | 2097527 | 0 | (0,3) | 16385 | 256 | 24 | | | \x02000000
- 3 | 8096 | 1 | 28 | 2097527 | 0 | 0 | (0,3) | 32769 | 10240 | 24 | | | \x03000000
- (3 rows)
-
- postgres=#
更新之后发现,lower增加了4,说明新增了一个ItemIddata,同时upper的值减少也说明了free space空间在减少。查询heap_page_items发现lp的值从原来的2增加到了3,多了一行数据。原来ID=2的lp=2的行还在,t_xmax=2097527和lp=3的t_xmin的值一样,并且lp=2和lp=3的t_ctid一样,说明是update test1 set id = 3 where id = 2;更新之后的新数据为lp=3的行,被更新之前的行还在,只是数据指向了lp=3的数据(t_ctid=(0,3))。
3)、再插入一行新数据
-
- postgres=# insert into test1 values (4);
- INSERT 0 1
- postgres=# select * from page_header(get_raw_page('test1', 0));
- lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
- ------------+----------+-------+-------+-------+---------+----------+---------+-----------
- 2/DB174688 | 0 | 0 | 40 | 8064 | 8192 | 8192 | 4 | 2097527
- (1 row)
-
- postgres=# select * from heap_page_items(get_raw_page('test1',0));
- lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
- ----+--------+----------+--------+---------+---------+----------+--------+-------------+------------+--------+--------+-------+------------
- 1 | 8160 | 1 | 28 | 2097522 | 0 | 0 | (0,1) | 1 | 2304 | 24 | | | \x01000000
- 2 | 8128 | 1 | 28 | 2097526 | 2097527 | 0 | (0,3) | 16385 | 1280 | 24 | | | \x02000000
- 3 | 8096 | 1 | 28 | 2097527 | 0 | 0 | (0,3) | 32769 | 10496 | 24 | | | \x03000000
- 4 | 8064 | 1 | 28 | 2097528 | 0 | 0 | (0,4) | 1 | 2048 | 24 | | | \x04000000
- (4 rows)
-
- postgres=#
通过heap_page_items新增了一行数据,被更新lp=2的行对应的空间没有被重复使用。其实这里使用到的就是postgres的MVCC机制,如果表频繁的发生更新、删掉等。vacuum没有即使的清理旧版本的信息,就会导致表膨胀。关于vacuum什么时候会触发清理旧数据,可以参考链接:postgres统计信息规则和触发条件分析_qxy0503的博客-CSDN博客_postgres 统计信息(postgres统计信息规则和触发条件分析)。
4)、手动对表test1做vacuum操作,然后再插入数据观察
- postgres=# vacuum test1;
- VACUUM
- postgres=# select * from page_header(get_raw_page('test1', 0));
- lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
- ------------+----------+-------+-------+-------+---------+----------+---------+-----------
- 2/DB176940 | 0 | 4 | 40 | 8096 | 8192 | 8192 | 4 | 0
- (1 row)
-
- <=====vacuum 之后lower没有发生变化,但是upper从原来的8064 变成了8096
-
- postgres=# select * from heap_page_items(get_raw_page('test1',0));
- lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
- ----+--------+----------+--------+---------+--------+----------+--------+-------------+------------+--------+--------+-------+------------
- 1 | 8160 | 1 | 28 | 2097522 | 0 | 0 | (0,1) | 1 | 2304 | 24 | | | \x01000000
- 2 | 3 | 2 | 0 | | | | | | | | | |
- 3 | 8128 | 1 | 28 | 2097527 | 0 | 0 | (0,3) | 32769 | 10496 | 24 | | | \x03000000
- 4 | 8096 | 1 | 28 | 2097528 | 0 | 0 | (0,4) | 1 | 2304 | 24 | | | \x04000000
- (4 rows)
-
- <=====新插入一行数据
- postgres=# insert into test1 values (5);
- INSERT 0 1
- postgres=# select * from page_header(get_raw_page('test1', 0));
- lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
- ------------+----------+-------+-------+-------+---------+----------+---------+-----------
- 2/DB17E7C8 | 0 | 0 | 44 | 8064 | 8192 | 8192 | 4 | 0
- (1 row)
-
- postgres=# select * from heap_page_items(get_raw_page('test1',0));
- lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
- ----+--------+----------+--------+---------+--------+----------+--------+-------------+------------+--------+--------+-------+------------
- 1 | 8160 | 1 | 28 | 2097522 | 0 | 0 | (0,1) | 1 | 2304 | 24 | | | \x01000000
- 2 | 3 | 2 | 0 | | | | | | | | | |
- 3 | 8128 | 1 | 28 | 2097527 | 0 | 0 | (0,3) | 32769 | 10496 | 24 | | | \x03000000
- 4 | 8096 | 1 | 28 | 2097528 | 0 | 0 | (0,4) | 1 | 2304 | 24 | | | \x04000000
- 5 | 8064 | 1 | 28 | 2097529 | 0 | 0 | (0,5) | 1 | 2048 | 24 | | | \x05000000
- (5 rows)
-
- postgres=#
vacuum test1之后,虽然空间已经回收,upper从原来的8096变成了8064,但是新插入的一行记录并没有使用lp=2的空间(理论上这个空间(t_ctid(0,2))的空间是可以重新使用的)。
5)、再插入一行数据测试
- postgres=# insert into test1 values (2);
- INSERT 0 1
- postgres=# \x
- Expanded display is off.
- postgres=# select * from page_header(get_raw_page('test1', 0));
- lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
- ------------+----------+-------+-------+-------+---------+----------+---------+-----------
- 2/DB17EA10 | 0 | 0 | 48 | 8032 | 8192 | 8192 | 4 | 0
- (1 row)
-
- postgres=# select * from heap_page_items(get_raw_page('test1',0));
- lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
- ----+--------+----------+--------+---------+--------+----------+--------+-------------+------------+--------+--------+-------+------------
- 1 | 8160 | 1 | 28 | 2097522 | 0 | 0 | (0,1) | 1 | 2304 | 24 | | | \x01000000
- 2 | 3 | 2 | 0 | | | | | | | | | |
- 3 | 8128 | 1 | 28 | 2097527 | 0 | 0 | (0,3) | 32769 | 10496 | 24 | | | \x03000000
- 4 | 8096 | 1 | 28 | 2097528 | 0 | 0 | (0,4) | 1 | 2304 | 24 | | | \x04000000
- 5 | 8064 | 1 | 28 | 2097529 | 0 | 0 | (0,5) | 1 | 2304 | 24 | | | \x05000000
- 6 | 8032 | 1 | 28 | 2097530 | 0 | 0 | (0,6) | 1 | 2304 | 24 | | | \x02000000
- (6 rows)
新插入的数据还是没有使用lp=3的空间。
6)、先把id<6的数据都删除了,之后再插入新数据
- postgres=# delete from test1 where id <6;
- DELETE 4
- postgres=# select * from heap_page_items(get_raw_page('test1',0));
- lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
- ----+--------+----------+--------+---------+---------+----------+--------+-------------+------------+--------+--------+-------+------------
- 1 | 8160 | 1 | 28 | 2097522 | 2097540 | 0 | (0,1) | 8193 | 1280 | 24 | | | \x01000000
- 2 | 3 | 2 | 0 | | | | | | | | | |
- 3 | 8128 | 1 | 28 | 2097527 | 2097540 | 0 | (0,3) | 40961 | 9472 | 24 | | | \x03000000
- 4 | 8096 | 1 | 28 | 2097528 | 2097540 | 0 | (0,4) | 8193 | 1280 | 24 | | | \x04000000
- 5 | 8064 | 1 | 28 | 2097529 | 2097540 | 0 | (0,5) | 8193 | 1280 | 24 | | | \x05000000
- 6 | 8032 | 1 | 28 | 2097530 | 0 | 0 | (0,6) | 1 | 2304 | 24 | | | \x02000000
- (6 rows)
-
- postgres=# vacuum VERBOSE test1;
- INFO: vacuuming "public.test1"
- INFO: "test1": removed 4 row versions in 1 pages
- INFO: "test1": found 4 removable, 1 nonremovable row versions in 1 out of 1 pages
- DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 2097542
- There were 1 unused item pointers.
- Skipped 0 pages due to buffer pins, 0 frozen pages.
- 0 pages are entirely empty.
- CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
- VACUUM
- postgres=# select * from heap_page_items(get_raw_page('test1',0));
- lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
- ----+--------+----------+--------+---------+--------+----------+--------+-------------+------------+--------+--------+-------+------------
- 1 | 0 | 0 | 0 | | | | | | | | | |
- 2 | 0 | 0 | 0 | | | | | | | | | |
- 3 | 0 | 0 | 0 | | | | | | | | | |
- 4 | 0 | 0 | 0 | | | | | | | | | |
- 5 | 0 | 0 | 0 | | | | | | | | | |
- 6 | 8160 | 1 | 28 | 2097530 | 0 | 0 | (0,6) | 1 | 2304 | 24 | | | \x06000000
- (6 rows)
-
- postgres=# select * from page_header(get_raw_page('test1', 0));
- lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
- ------------+----------+-------+-------+-------+---------+----------+---------+-----------
- 2/DB18FA68 | 0 | 5 | 48 | 8160 | 8192 | 8192 | 4 | 0
- (1 row)
-
- <======vacuum之后lower的值没有发生变化,但是upper从原来的8032 变成了8160,说明空间释放了
-
- postgres=# insert into test1 values (1);
- INSERT 0 1
- postgres=#
- postgres=# select * from page_header(get_raw_page('test1', 0));
- lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
- ------------+----------+-------+-------+-------+---------+----------+---------+-----------
- 2/DB18FC00 | 0 | 1 | 48 | 8128 | 8192 | 8192 | 4 | 0
- (1 row)
-
- postgres=# select * from heap_page_items(get_raw_page('test1',0));
- lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
- ----+--------+----------+--------+---------+--------+----------+--------+-------------+------------+--------+--------+-------+------------
- 1 | 8128 | 1 | 28 | 2097542 | 0 | 0 | (0,1) | 1 | 2048 | 24 | | | \x01000000
- 2 | 0 | 0 | 0 | | | | | | | | | |
- 3 | 0 | 0 | 0 | | | | | | | | | |
- 4 | 0 | 0 | 0 | | | | | | | | | |
- 5 | 0 | 0 | 0 | | | | | | | | | |
- 6 | 8160 | 1 | 28 | 2097530 | 0 | 0 | (0,6) | 1 | 2304 | 24 | | | \x06000000
- (6 rows)
-
- postgres=#
- <=====新插入的数据也使用了之前已经分配过的空间(t_ctid(0,1))。
通过上面的测试发现,其实vacuum table是不会降低page的高水位的,只是把旧数据占的空间释放,所以如果vaccum不及时导致了表的膨胀,那么再使用vacuum table是不能把表的大小降下去的。
测试如下:
7)、创建测试表
- postgres=#
- postgres=# create table tbl (id int primary key, info text, crt_time timestamp);
- CREATE TABLE
- postgres=# insert into tbl select generate_series(1,2000000),md5(random()::text),clock_timestamp();
- INSERT 0 2000000
- postgres=# \dt+ tbl
- List of relations
- Schema | Name | Type | Owner | Size | Description
- --------+------+-------+----------+--------+-------------
- public | tbl | table | postgres | 146 MB |
- (1 row)
-
- postgres=# \dt+ tbl_pkey
- Did not find any relation named "tbl_pkey".
- postgres=# \di+ tbl_pkey
- List of relations
- Schema | Name | Type | Owner | Table | Size | Description
- --------+----------+-------+----------+-------+-------+-------------
- public | tbl_pkey | index | postgres | tbl | 43 MB |
- (1 row)
-
- postgres=#
8)、编写测试脚本
[postgres@postgres ~]$ cat test.sql
\set id random(1,2000000)
update tbl set info=md5(random()::text) where id between :id-250000 and :id+250000;
[postgres@postgres ~]$
9)、测试数据库中存在持有事务号的长事务,这个事务时间段内,数据库产生的垃圾无法被回收。
- [postgres@postgres ~]$ pgbench -M prepared -h 127.0.0.1 -p 5442 -n -r -f ./test.sql -c 1 -j 1 -T 30
- transaction type: ./test.sql
- scaling factor: 1
- query mode: prepared
- number of clients: 1
- number of threads: 1
- duration: 30 s
- number of transactions actually processed: 4
- latency average = 10183.262 ms
- tps = 0.098200 (including connections establishing)
- tps = 0.098204 (excluding connections establishing)
- statement latencies in milliseconds:
- 0.005 \set id random(1,2000000)
- 10182.856 update tbl set info=md5(random()::text) where id between :id-250000 and :id+250000;
- [postgres@postgres ~]$
查看更新表之后的表大小和索引大小
- postgres=# \dt+ tbl
- List of relations
- Schema | Name | Type | Owner | Size | Description
- --------+------+-------+----------+--------+-------------
- public | tbl | table | postgres | 203 MB |
- (1 row)
-
- postgres=# \di+ tbl_pkey
- List of relations
- Schema | Name | Type | Owner | Table | Size | Description
- --------+----------+-------+----------+-------+--------+-------------
- public | tbl_pkey | index | postgres | tbl | 103 MB |
- (1 row)
10)、执行vacuum tbl
- postgres=# vacuum VERBOSE tbl;
- INFO: vacuuming "public.tbl"
- INFO: "tbl": found 0 removable, 107 nonremovable row versions in 1 out of 26029 pages
- DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 2097561
- There were 0 unused item pointers.
- Skipped 0 pages due to buffer pins, 7335 frozen pages.
- 0 pages are entirely empty.
- CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
- INFO: vacuuming "pg_toast.pg_toast_157481"
- INFO: index "pg_toast_157481_index" now contains 0 row versions in 1 pages
- DETAIL: 0 index row versions were removed.
- 0 index pages have been deleted, 0 are currently reusable.
- CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
- INFO: "pg_toast_157481": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
- DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 2097561
- There were 0 unused item pointers.
- Skipped 0 pages due to buffer pins, 0 frozen pages.
- 0 pages are entirely empty.
- CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
- VACUUM
- postgres=# \dt+ tbl
- List of relations
- Schema | Name | Type | Owner | Size | Description
- --------+------+-------+----------+--------+-------------
- public | tbl | table | postgres | 203 MB |
- (1 row)
-
- postgres=# \di+ tbl_pkey
- List of relations
- Schema | Name | Type | Owner | Table | Size | Description
- --------+----------+-------+----------+-------+--------+-------------
- public | tbl_pkey | index | postgres | tbl | 103 MB |
- (1 row)
-
- postgres=#
vacuum发现表的空间和tbl的空间并没有释放,所以,如果已经膨胀之后的表,再使用vacuum table是无法降低表的大小的,但是使用vacuum full table是可以降低表的大小的。
如下:
- postgres=# vacuum FULL VERBOSE tbl;
- INFO: vacuuming "public.tbl"
- INFO: "tbl": found 0 removable, 2000000 nonremovable row versions in 26029 pages
- DETAIL: 0 dead row versions cannot be removed yet.
- CPU: user: 0.25 s, system: 2.93 s, elapsed: 3.54 s.
- VACUUM
- postgres=# \dt+ tbl
- List of relations
- Schema | Name | Type | Owner | Size | Description
- --------+------+-------+----------+--------+-------------
- public | tbl | table | postgres | 146 MB |
- (1 row)
-
- postgres=# \di+ tbl_pkey
- List of relations
- Schema | Name | Type | Owner | Table | Size | Description
- --------+----------+-------+----------+-------+-------+-------------
- public | tbl_pkey | index | postgres | tbl | 43 MB |
- (1 row)
PostgreSQL的表或索引发生膨胀后,用户可以使用vacuum full或rewrite table(如cluster)的方式重建表。
但是vacuum full或者rewrite都需要持有排它锁,会堵塞读操作。为了减少锁冲突,社区有一个名为pg_reorg或pg_repack的插件,使用了增量的方式重组数据,最后通过切换FILENODE完成数据重组。仅仅在切换FILENODE时需要持有排他锁,非常短暂,影响比VACUUM FULL和rewrite的方式小多了。
参考资料:
postgresql磁盘存储结构 - exact - 博客园
https://github.com/digoal/blog/blob/master/201504/20150429_02.md
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。