赞
踩
数据目录下的pg_xlog目录(pg9.6上版本)下,产生wal日志文件段(如000000010000000000000001),每一个wal段的page的构成如下图。
wal页面有两种页头结构,XLogPageHeaderData和XLogLongPageHeaderData。
日志段文件第一个页面的页头为XLogLongPageHeaderData,后续页面页头为XLogPageHeaderData。
可以看出XLogLongPageHeaderData比XLogPageHeaderData多出三个成员。
xlp_sysid对应pg_control中的system identifier;
xlp_seg_size为段大小;
xlp_xlog_blcksz为页面尺寸;
这个数据块存储着上一个page的最后一个record没有存完的数据。
当wal记录跨页存储时,新页面中页头的字段xlp_info会标识为XLP_FIRST_IS_CONTRECORD
/* When record crosses page boundary, set this flag in new page's header */
#define XLP_FIRST_IS_CONTRECORD 0x0001
xlog日志记录允许跨页面存储,在当前页面剩余空间不足以存储整条记录时,可以存储在下一个页面中。XLogPageHeaderData的字段xlp_rem_len
记录前一个页面剩余数据的长度。当xlp_rem_len为0时,这个数据块也就不存在了。
参照下文中的wal record结构。
页面的最后一条记录可能是不完整的页面,剩余部分可能存储在下一个页面中。
一个记录里的XlogRecord结构是不能跨页存储的。因此,当剩余的空间不能存储一个XLogRecord结构体时就会被舍弃。
每一个wal记录Record的结构如下图所示。
XLogRecord是一个wal记录的入口,在解析wal记录时,将从这个结构体开始入手。如下是XlogRecord的结构体定义。
typedef struct XLogRecord
{
uint32 xl_tot_len; /* total len of entire record */
TransactionId xl_xid; /* xact id */
XLogRecPtr xl_prev; /* ptr to previous record in log */
uint8 xl_info; /* flag bits, see below */
RmgrId xl_rmid; /* resource manager for this record */
/* 2 bytes of padding here, initialize to zero */
pg_crc32c xl_crc; /* CRC for this record */
/* XLogRecordBlockHeaders and XLogRecordDataHeader follow, no padding */
} XLogRecord;
各成员的含义:
xl_tot_len:这个记录的总长度,包括图所有的模块。
xl_xid:产生此记录的事务ID。
xl_prev:前一个记录的位置。
xl_info:此成员标志着是何种子类型的wal记录。xl_info与xl_rmid结合使用,例如xl_rmid为RM_HEAP_ID,那么xl_info可以为 XLOG_HEAP_INSERT、XLOG_HEAP_DELETE、XLOG_HEAP_UPDATE。
xl_rmid:此成员标志着是何种类型的wal记录,例如RM_XACT_ID为事务相关的记录、 RM_DBASE_ID 为数据库创建删除的记录、RM_HEAP_ID为表数据增删改相关记录。它的取值范围在src/include/access/rmgrlist.h文件中可以看到。
xl_crc:校验位。
typedef struct XLogRecordBlockHeader
{
uint8 id; /* block reference ID */
uint8 fork_flags; /* fork within the relation, and flags */
uint16 data_length; /* number of payload bytes (not including page
* image) */
/* If BKPBLOCK_HAS_IMAGE, an XLogRecordBlockImageHeader struct follows */
/* If BKPBLOCK_SAME_REL is not set, a RelFileNode follows */
/* BlockNumber follows */
} XLogRecordBlockHeader;
各成员的含义:
id:一个记录中可以有多个block(MAX: 32),此id是block的序号。
fork_flags: 本block存储有哪些信息。
data_length:决定tupledata中存储的数据的长度(不包括page image)。
fork_flag取值如下:
/*
* The fork number fits in the lower 4 bits in the fork_flags field. The upper
* bits are used for flags.
*/
#define BKPBLOCK_FORK_MASK 0x0F
#define BKPBLOCK_FLAG_MASK 0xF0
#define BKPBLOCK_HAS_IMAGE 0x10 /* block data is an XLogRecordBlockImage
标识记录内容为full page write的block*/
#define BKPBLOCK_HAS_DATA 0x20 //标识记录内容为tuple内容的修改
#define BKPBLOCK_WILL_INIT 0x40 /* redo will re-init the page */
#define BKPBLOCK_SAME_REL 0x80 /* RelFileNode omitted, same as previous 标识与前一个页面属于同一个关系时,省略RelFileNode*/
wal记录是一个full page write记录时,存在此结构
/* * Additional header information when a full-page image is included * (i.e. when BKPBLOCK_HAS_IMAGE is set). * * As a trivial form of data compression, the XLOG code is aware that * PG data pages usually contain an unused "hole" in the middle, which * contains only zero bytes. If the length of "hole" > 0 then we have removed * such a "hole" from the stored data (and it's not counted in the * XLOG record's CRC, either). Hence, the amount of block data actually * present is BLCKSZ - the length of "hole" bytes. * * When wal_compression is enabled, a full page image which "hole" was * removed is additionally compressed using PGLZ compression algorithm. * This can reduce the WAL volume, but at some extra cost of CPU spent * on the compression during WAL logging. In this case, since the "hole" * length cannot be calculated by subtracting the number of page image bytes * from BLCKSZ, basically it needs to be stored as an extra information. * But when no "hole" exists, we can assume that the "hole" length is zero * and no such an extra information needs to be stored. Note that * the original version of page image is stored in WAL instead of the * compressed one if the number of bytes saved by compression is less than * the length of extra information. Hence, when a page image is successfully * compressed, the amount of block data actually present is less than * BLCKSZ - the length of "hole" bytes - the length of extra information. */ typedef struct XLogRecordBlockImageHeader { uint16 length; /* number of page image bytes */ uint16 hole_offset; /* number of bytes before "hole" */ uint8 bimg_info; /* flag bits, see below */ /* * If BKPIMAGE_HAS_HOLE and BKPIMAGE_IS_COMPRESSED, an * XLogRecordBlockCompressHeader struct follows. */ } XLogRecordBlockImageHeader;
各成员的含义:
length:保存的page的总长度(去除空洞数据、且压缩后的长度)。
hole_offset: 空洞数据之前的数据的size。
bimg_info:标志位,记录是否包含空洞数据,是否进行了压缩
note: 空洞数据代表数据块中未存记录,全是0的部分,pg为了缩减wal大小,写日志时去除了空洞数据,并可能压缩记录
bimg_info可能的取值如下:
/* Information stored in bimg_info */
#define BKPIMAGE_HAS_HOLE 0x01 /* page image has "hole" */
#define BKPIMAGE_IS_COMPRESSED 0x02 /* page image is compressed */
此结构记录空洞数据的大小
/*
* Extra header information used when page image has "hole" and
* is compressed.
*/
typedef struct XLogRecordBlockCompressHeader
{
uint16 hole_length; /* number of bytes in "hole" */
} XLogRecordBlockCompressHeader;
此结构记录了此block所属的表。如果当前block与前一个block来源于同一个表时,那么fork_flags中就不会有BKPBLOCK_SAME_REL标志位
typedef struct RelFileNode
{
Oid spcNode; /* tablespace */
Oid dbNode; /* database */
Oid relNode; /* relation */
} RelFileNode;
记录此block记录的page的块号。
此结构被record中的maindata(checkpoint等日志数据)部分使用,当maindata的size小于256时使用XLogRecordDataHeaderShort结构
否则使用XLogRecordDataHeaderLong结构
typedef struct XLogRecordDataHeaderShort
{
uint8 id; /* XLR_BLOCK_ID_DATA_SHORT */
uint8 data_length; /* number of payload bytes */
} XLogRecordDataHeaderShort;
typedef struct XLogRecordDataHeaderLong
{
uint8 id; /* XLR_BLOCK_ID_DATA_LONG */
/* followed by uint32 data_length, unaligned */
} XLogRecordDataHeaderLong;
block data包含full-write-page data(全页写日志记录)和tuple data(更新日志记录)两种类型数据
main data部分保存非buff性的数据,比如checkpoint等日志数据.
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。