赞
踩
说明:
Postgres的主备,也叫物理复制,和发布订阅不同的是,发布订阅可以针对某个表,主备物理复制是针对整个数据库的。有点类似SQL Server的主备和Oracle Data Guard。
参考文档:
http://www.postgres.cn/docs/10/warm-standby.html
-- 主备信息
PG 版本 10.15
primay : 192.168.2.80
standby : 192.168.2.81
-- 安装主备库(略)
-- 参数修改,只需要修改主库的,备库的postgres.conf、pg_hba.conf等包含在data目录下的文件会被自动备份到备库
- listen_address = '*'
- wal_level = replica
- archive_mode = on
- archive_command = 'cp %p /postgres/archive/%f'
- max_wal_senders= 10
- wal_keep_segments=1024
- hot_standby = on
-
-
- mkdir -p /postgres/archive -- 主备上存放归档的目录
- chown -R postgres.postgres /postgres/archive/
-- 创建复制账号
create user repl REPLICATION LOGIN ENCRYPTED PASSWORD 'oracle';
-- 修改pg_hba.conf
- host all all 192.168.2.0/24 md5
- host replication repl 192.168.2.80/24 md5
- host replication repl 192.168.2.81/24 md5
-- 备库上在线备份主库 (备库要处于关闭状态,并且备库的data目录下应该为空)
- pg_basebackup -h 192.168.2.80 -U repl -p 5432 -F p -X s -v -P -R -D /opt/PostgreSQL/10/data/ -l pg_init
-
- -F 指定生成备份的数据格式 (p 原样输出,t tar格式输出)
- -X 备份开始后,启动另一个流复制连接从主库接收wal日志,有fetch和stream两种方式,建议stream方式
- -v verbose格式,显示内容在屏幕上
- -P 显示过程百分比
- -R 备份结束后,自动生成recover.conf文件
- -D 把备份写入到那个目录
- -l 表示指定个备份的标识,运行命令后可以看到进度提示
因为备库的data目录下有文件,所以备份的时候报错
- [postgres@test1 /opt/PostgreSQL/10/data]$pg_basebackup -h 192.168.2.80 -U repl -p 5432 -F p -X s -v -P -R -D /opt/PostgreSQL/10/data/ -l pg_init
- pg_basebackup: directory "/opt/PostgreSQL/10/data/" exists but is not empty
清理data目录后,再次备份,就可以了
- [postgres@test1 /opt/PostgreSQL/10/data]$rm -rf *
- [postgres@test1 /opt/PostgreSQL/10/data]$pg_basebackup -h 192.168.2.80 -U repl -p 5432 -F p -X s -v -P -R -D /opt/PostgreSQL/10/data/ -l pg_init
- Password:
- pg_basebackup: initiating base backup, waiting for checkpoint to complete
- pg_basebackup: checkpoint completed
- pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
- pg_basebackup: starting background WAL receiver
- 40635/40635 kB (100%), 1/1 tablespace
- pg_basebackup: write-ahead log end point: 0/2000130
- pg_basebackup: waiting for background process to finish streaming ...
- pg_basebackup: base backup completed
-
- [postgres@test1 /opt/PostgreSQL/10/data]$ls
- backup_label log pg_ident.conf pg_replslot pg_stat_tmp PG_VERSION postgresql.conf
- base pg_commit_ts pg_logical pg_serial pg_subtrans pg_wal recovery.conf
- current_logfiles pg_dynshmem pg_multixact pg_snapshots pg_tblspc pg_xact
- global pg_hba.conf pg_notify pg_stat pg_twophase postgresql.auto.conf
- [postgres@test1 /opt/PostgreSQL/10/data]$

-- 备库修改recovery.conf (-R参数 备份结束后,自动生成recover.conf文件)
- standby_mode = 'on'
- primary_conninfo = 'user=repl password=oracle host=192.168.2.80 port=5432 sslmode=disable sslcompression=0 target_session_attrs=any'
- recovery_target_timeline = 'latest'
-
- standby_mode -- 说明是否启用数据库为备库
- primary_conninfo -- 设置主库的连接方式
- recovery_target_timeline -- 设置恢复的时间线
自动生成的recovery.conf
- [postgres@test1 /opt/PostgreSQL/10/data]$more recovery.conf
- standby_mode = 'on'
- primary_conninfo = 'user=repl password=oracle host=192.168.2.80 port=5432 sslmode=prefer sslcompress
- ion=1 krbsrvname=postgres target_session_attrs=any'
- [postgres@test1 /opt/PostgreSQL/10/data]$
-- 启动从库
- [postgres@test1 /opt/PostgreSQL/10/data]$pg_ctl start
- waiting for server to start....2021-07-14 09:23:35.939 CST [15353] LOG: listening on IPv4 address "0.0.0.0", port 5432
- 2021-07-14 09:23:35.939 CST [15353] LOG: listening on IPv6 address "::", port 5432
- 2021-07-14 09:23:35.940 CST [15353] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
- 2021-07-14 09:23:35.968 CST [15353] LOG: redirecting log output to logging collector process
- 2021-07-14 09:23:35.968 CST [15353] HINT: Future log output will appear in directory "log".
- done
- server started
- [postgres@test1 /opt/PostgreSQL/10/data]$
-- 测试主从同步
- mydb=# create table pub_t1(id int primary key,name text,crt_time timestamp);
- CREATE TABLE
- mydb=# insert into pub_t1 values(1,'aa',now());
- INSERT 0 1
- mydb=# select * from pub_t1;
- id | name | crt_time
- ----+------+---------------------------
- 1 | aa | 2021-07-14 09:27:44.59695
- (1 row)
-
- mydb=#
-
- postgres=# \c mydb
- You are now connected to database "mydb" as user "postgres".
- mydb=# select * from pub_t1;
- id | name | crt_time
- ----+------+---------------------------
- 1 | aa | 2021-07-14 09:27:44.59695
- (1 row)
-
- mydb=#

--查看复制状态 ,pg_stat_replication (在主库上查询,备库上看不到复制信息,备库是只读的)
- select pid ,usesysid,usename,client_addr,state,sync_state from pg_stat_replication;
-
- mydb=# select pid ,usesysid,usename,client_addr,state,sync_state from pg_stat_replication;
- pid | usesysid | usename | client_addr | state | sync_state
- -------+----------+---------+--------------+-----------+------------
- 13077 | 16406 | repl | 192.168.2.81 | streaming | async
- (1 row)
-
- mydb=#
-
- mydb=# select pid ,usesysid,usename,client_addr,state,sync_state from pg_stat_replication;
- pid | usesysid | usename | client_addr | state | sync_state
- -----+----------+---------+-------------+-------+------------
- (0 rows)
-
- mydb=# insert into pub_t1 values(2,'bb',now());
- ERROR: cannot execute INSERT in a read-only transaction
- mydb=#

--查看pg wal的接收情况 (在备库上查询,主库上查询返回0)
- postgres=# select * from pg_stat_wal_receiver;
- -[ RECORD 1 ]---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- pid | 16763
- status | streaming
- receive_start_lsn | 0/3000000
- receive_start_tli | 1
- received_lsn | 0/3000000
- received_tli | 1
- last_msg_send_time | 2021-07-14 11:17:44.260407+08
- last_msg_receipt_time | 2021-07-14 11:17:44.227817+08
- latest_end_lsn | 0/301F920
- latest_end_time | 2021-07-14 09:44:06.28544+08
- slot_name |
- conninfo | user=repl password=******** dbname=replication host=192.168.2.80 port=5432 fallback_application_name=walreceiver sslmode=prefer sslcompression=1 krbsrvname=postgres target_session_attrs=any
-
- postgres=#

-- 监控主备看的同步 ,通过比较主服务器上的当前 WAL 写位置和后备服务器接收到的最后一个 WAL 位置来计算这个滞后量
- mydb=# select * from pg_current_wal_lsn();
- -[ RECORD 1 ]------+----------
- pg_current_wal_lsn | 0/301FAB0
-
- mydb=#
-
-
- mydb=# select * from pg_last_wal_receive_lsn();
- -[ RECORD 1 ]-----------+----------
- pg_last_wal_receive_lsn | 0/301FAB0
-
- mydb=#
--通过pg_stat_replication视图检查wal发送者进程。pg_current_wal_lsn 与视图的sent_lsn域之间的巨大差异表示主服务器承受着巨大的负载 (在主库上查询)
- mydb=# select pg_current_wal_lsn(); -- 主库上查询
- -[ RECORD 1 ]------+----------
- pg_current_wal_lsn | 0/301FB90
-
- mydb=# select sent_lsn from pg_stat_replication; -- 主库上查询
- -[ RECORD 1 ]-------
- sent_lsn | 0/301FB90
-
- mydb=#
--而sent_lsn和后备服务器上pg_last_wal_receive_lsn 之间的差异可能表示网络延迟或者后备服务器正承受着巨大的负载。
- mydb=# select sent_lsn from pg_stat_replication; -- 主库上查询
- -[ RECORD 1 ]-------
- sent_lsn | 0/301FB90
-
- mydb=#
- mydb=# select pg_last_wal_receive_lsn(); -- 备库上查询
- -[ RECORD 1 ]-----------+----------
- pg_last_wal_receive_lsn | 0/301FB90
-
- mydb=#
-- 调整为同步复制 .配置同步复制就只需要一个额外的配置步骤: synchronous_standby_names必须被设置为一个非空值。 synchronous_commit也必须被设置为on
将synchronous_commit设置为remote_write 将导致每次提交都等待后备服务器已经接收提交记录并将它写出到其自身所在的操作系统的确认, 但并非等待数据都被刷出到后备服务器上的磁盘。这种设置提供了比on 要弱一点的持久性保障:在一次操作系统崩溃事件中后备服务器可能丢失数据, 尽管它不是一次PostgreSQL崩溃。不过,在实际中它是一种有用的设置, 因为它可以减少事务的响应时间。 只有当主服务器和后备服务器都崩溃并且主服务器的数据库同时被损坏的情况下, 数据丢失才会发生。
把synchronous_commit设置为remote_apply 将导致每一次提交都会等待,直到当前的同步后备服务器报告说它们已经重放了该事务, 这样就会使该事务对用户查询可见。在简单的情况下, 这允许带有因果一致性的负载均衡。
在主库上修改synchronous_commit及synchronous_standby_names参数,调整后,reload或重启
- synchronous_commit = remote_write
- synchronous_standby_names = '*'
-
- mydb=# alter system set synchronous_commit = remote_write;
- ALTER SYSTEM
- mydb=# alter system set synchronous_standby_names = '*';
- ALTER SYSTEM
- mydb=# select pid ,usesysid,usename,client_addr,state,sync_state from pg_stat_replication;
- pid | usesysid | usename | client_addr | state | sync_state
- -------+----------+---------+--------------+-----------+------------
- 14143 | 16406 | repl | 192.168.2.81 | streaming | sync
- (1 row)
-
- mydb=#
再次修改回异步模式
- mydb=# alter system set synchronous_commit = on;
- ALTER SYSTEM
- mydb=# alter system set synchronous_standby_names ='';
- ALTER SYSTEM
- mydb=# select pid ,usesysid,usename,client_addr,state,sync_state from pg_stat_replication;
- pid | usesysid | usename | client_addr | state | sync_state
- -------+----------+---------+--------------+-----------+------------
- 14143 | 16406 | repl | 192.168.2.81 | streaming | async
- (1 row)
-
- mydb=#
-- 关于备库的只读模式,如果修改standby_mode = off,这个值是不起作用的(官方文档有说明),如果没有设置standby_mode参数,备库是无法启动的 。可以在日志中看到。这个时候,备库以为没有restore完毕,会让设置restore_command来进行继续restore 。
- [postgres@test1 /opt/PostgreSQL/10/data/log]$tail -f postgresql-2021-07-14_094125.log
- 2021-07-14 09:41:25.884 CST [16591] LOG: database system was shut down in recovery at 2021-07-14 09:36:24 CST
- 2021-07-14 09:41:25.884 CST [16591] FATAL: recovery command file "recovery.conf" must specify restore_command when standby mode is not enabled
- 2021-07-14 09:41:25.885 CST [16589] LOG: startup process (PID 16591) exited with exit code 1
- 2021-07-14 09:41:25.885 CST [16589] LOG: aborting startup due to startup process failure
- 2021-07-14 09:41:25.886 CST [16589] LOG: database system is shut down
- ^C
-- 在备库上清理日志
备库的wal日志存放在/opt/PostgreSQL/10/data/pg_wal目录下,清理该日志的命令如下(写在备库的recovery.conf中)
archive_cleanup_command = 'pg_archivecleanup /opt/PostgreSQL/10/data/pg_wal %r'
-- 使用复制槽,使用复制槽,有这些好处 。
复制槽提供了一种自动化的方法来确保主控机在所有的后备机收到 WAL 段 之前不会移除它们,并且主控机也不会移除可能导致 恢复冲突的行,即使后备机断开也是如此。
作为复制槽的替代,也可以使用wal_keep_segments 阻止移除旧的 WAL 段,或者使用archive_command 把段保存在一个归档中。不过,这些方法常常会导致保留的 WAL 段比需要的 更多,而复制槽只保留已知所需要的段数量。这些方法的一个优点是它们为 pg_wal的空间需求提供了界限,但目前使用复制槽无法做到。
类似地,hot_standby_feedback和 vacuum_defer_cleanup_age保护了相关行不被 vacuum 移除,但是前者在后备机断开期间无法提供保护,而后者则需要被设置为一个很高 的值以提供足够的保护。复制槽克服了这些缺点。
- SELECT * FROM pg_create_physical_replication_slot('node_a_slot');
- SELECT * FROM pg_replication_slots;
编译备库的recovery.conf ,添加复制槽
- primary_slot_name = 'node_a_slot'
- [postgres@test1 /opt/PostgreSQL/10/data]$more recovery.conf
- standby_mode=on
- primary_conninfo = 'user=repl password=oracle host=192.168.2.80 port=5432 sslmode=prefer sslcompress
- ion=1 krbsrvname=postgres target_session_attrs=any'
- archive_cleanup_command = 'pg_archivecleanup /opt/PostgreSQL/10/data/pg_wal %r'
- primary_slot_name = 'node_a_slot'
-
- [postgres@test1 /opt/PostgreSQL/10/data]$
-- 通过pg_controldata查看主备库的情况
- [postgres@test /opt/PostgreSQL/10/data]$pg_controldata
- pg_control version number: 1002
- Catalog version number: 201707211
- Database system identifier: 6984321128679731040
- Database cluster state: in production
- pg_control last modified: Wed 14 Jul 2021 09:32:29 AM CST
- Latest checkpoint location: 0/301F878
- Prior checkpoint location: 0/301F660
- Latest checkpoint's REDO location: 0/301F840
- Latest checkpoint's REDO WAL file: 000000010000000000000003
- Latest checkpoint's TimeLineID: 1
- Latest checkpoint's PrevTimeLineID: 1
- Latest checkpoint's full_page_writes: on
- Latest checkpoint's NextXID: 0:571
- Latest checkpoint's NextOID: 24599
- Latest checkpoint's NextMultiXactId: 1
- Latest checkpoint's NextMultiOffset: 0
- Latest checkpoint's oldestXID: 549
- Latest checkpoint's oldestXID's DB: 1
- Latest checkpoint's oldestActiveXID: 571
- Latest checkpoint's oldestMultiXid: 1
- Latest checkpoint's oldestMulti's DB: 1
- Latest checkpoint's oldestCommitTsXid:0
- Latest checkpoint's newestCommitTsXid:0
- Time of latest checkpoint: Wed 14 Jul 2021 09:32:28 AM CST
- Fake LSN counter for unlogged rels: 0/1
- Minimum recovery ending location: 0/0
- Min recovery ending loc's timeline: 0
- Backup start location: 0/0
- Backup end location: 0/0
- End-of-backup record required: no
- wal_level setting: replica
- wal_log_hints setting: off
- max_connections setting: 100
- max_worker_processes setting: 8
- max_prepared_xacts setting: 0
- max_locks_per_xact setting: 64
- track_commit_timestamp setting: off
- Maximum data alignment: 8
- Database block size: 8192
- Blocks per segment of large relation: 131072
- WAL block size: 8192
- Bytes per WAL segment: 16777216
- Maximum length of identifiers: 64
- Maximum columns in an index: 32
- Maximum size of a TOAST chunk: 1996
- Size of a large-object chunk: 2048
- Date/time type storage: 64-bit integers
- Float4 argument passing: by value
- Float8 argument passing: by value
- Data page checksum version: 0
- Mock authentication nonce: e560ba314666f9509c6d5abe645a9aa6e30d7f133a15e2bfb028fb37e2d43ef3
- [postgres@test /opt/PostgreSQL/10/data]$

- [postgres@test1 /opt/PostgreSQL/10/data]$pg_controldata
- pg_control version number: 1002
- Catalog version number: 201707211
- Database system identifier: 6984321128679731040
- Database cluster state: in archive recovery
- pg_control last modified: Wed 14 Jul 2021 09:44:06 AM CST
- Latest checkpoint location: 0/301F878
- Prior checkpoint location: 0/301F878
- Latest checkpoint's REDO location: 0/301F840
- Latest checkpoint's REDO WAL file: 000000010000000000000003
- Latest checkpoint's TimeLineID: 1
- Latest checkpoint's PrevTimeLineID: 1
- Latest checkpoint's full_page_writes: on
- Latest checkpoint's NextXID: 0:571
- Latest checkpoint's NextOID: 24599
- Latest checkpoint's NextMultiXactId: 1
- Latest checkpoint's NextMultiOffset: 0
- Latest checkpoint's oldestXID: 549
- Latest checkpoint's oldestXID's DB: 1
- Latest checkpoint's oldestActiveXID: 571
- Latest checkpoint's oldestMultiXid: 1
- Latest checkpoint's oldestMulti's DB: 1
- Latest checkpoint's oldestCommitTsXid:0
- Latest checkpoint's newestCommitTsXid:0
- Time of latest checkpoint: Wed 14 Jul 2021 09:32:28 AM CST
- Fake LSN counter for unlogged rels: 0/1
- Minimum recovery ending location: 0/301F920
- Min recovery ending loc's timeline: 1
- Backup start location: 0/0
- Backup end location: 0/0
- End-of-backup record required: no
- wal_level setting: replica
- wal_log_hints setting: off
- max_connections setting: 100
- max_worker_processes setting: 8
- max_prepared_xacts setting: 0
- max_locks_per_xact setting: 64
- track_commit_timestamp setting: off
- Maximum data alignment: 8
- Database block size: 8192
- Blocks per segment of large relation: 131072
- WAL block size: 8192
- Bytes per WAL segment: 16777216
- Maximum length of identifiers: 64
- Maximum columns in an index: 32
- Maximum size of a TOAST chunk: 1996
- Size of a large-object chunk: 2048
- Date/time type storage: 64-bit integers
- Float4 argument passing: by value
- Float8 argument passing: by value
- Data page checksum version: 0
- Mock authentication nonce: e560ba314666f9509c6d5abe645a9aa6e30d7f133a15e2bfb028fb37e2d43ef3
- [postgres@test1 /opt/PostgreSQL/10/data]$

-- 主备库上的postgres进程
- [root@test /root]$ps -ef | grep postgres
- root 11724 11599 0 08:58 pts/1 00:00:00 su - postgres
- postgres 11725 11724 0 08:58 pts/1 00:00:00 -bash
- postgres 12613 1 0 09:14 pts/1 00:00:00 /opt/PostgreSQL/10/bin/postgres -D /opt/PostgreSQL/10/data
- postgres 12614 12613 0 09:14 ? 00:00:00 postgres: logger process
- postgres 12616 12613 0 09:14 ? 00:00:00 postgres: checkpointer process
- postgres 12617 12613 0 09:14 ? 00:00:00 postgres: writer process
- postgres 12618 12613 0 09:14 ? 00:00:00 postgres: wal writer process
- postgres 12619 12613 0 09:14 ? 00:00:00 postgres: autovacuum launcher process
- postgres 12620 12613 0 09:14 ? 00:00:00 postgres: archiver process last was 000000010000000000000002.00000028.backup
- postgres 12621 12613 0 09:14 ? 00:00:00 postgres: stats collector process
- postgres 12622 12613 0 09:14 ? 00:00:00 postgres: bgworker: logical replication launcher
- postgres 13149 12613 0 09:25 ? 00:00:00 postgres: postgres mydb [local] idle
- postgres 13341 12613 0 09:28 ? 00:00:00 postgres: postgres postgres 192.168.2.240(60256) idle
- postgres 13343 12613 0 09:28 ? 00:00:00 postgres: postgres repdb 192.168.2.240(55611) idle
- postgres 13346 12613 0 09:28 ? 00:00:00 postgres: postgres mydb 192.168.2.240(56043) idle
- postgres 13353 12613 0 09:28 ? 00:00:00 postgres: postgres mydb 192.168.2.240(55401) idle
- postgres 14143 12613 0 09:44 ? 00:00:00 postgres: wal sender process repl 192.168.2.81(46563) streaming 0/301F920
- root 23017 18649 0 12:34 pts/1 00:00:00 grep --color=auto postgres
- [root@test /root]$

- [postgres@test1 /opt/PostgreSQL/10/data]$ps -ef | grep postgres
- root 14519 14478 0 09:10 pts/1 00:00:00 su - postgres
- postgres 14520 14519 0 09:10 pts/1 00:00:00 -bash
- root 15376 8324 0 09:23 pts/0 00:00:00 su - postgres
- postgres 15378 15376 0 09:23 pts/0 00:00:00 -bash
- postgres 16757 1 0 09:44 pts/1 00:00:00 /opt/PostgreSQL/10/bin/postgres
- postgres 16758 16757 0 09:44 ? 00:00:00 postgres: logger process
- postgres 16759 16757 0 09:44 ? 00:00:00 postgres: startup process recovering 000000010000000000000003
- postgres 16760 16757 0 09:44 ? 00:00:00 postgres: checkpointer process
- postgres 16761 16757 0 09:44 ? 00:00:00 postgres: writer process
- postgres 16762 16757 0 09:44 ? 00:00:00 postgres: stats collector process
- postgres 16763 16757 0 09:44 ? 00:00:10 postgres: wal receiver process
- postgres 16874 15378 0 09:45 pts/0 00:00:00 tail -f postgresql-2021-07-14_094406.log
- root 21998 21641 0 11:11 pts/2 00:00:00 su - postgres
- postgres 21999 21998 0 11:11 pts/2 00:00:00 -bash
- postgres 22501 16757 0 11:17 ? 00:00:00 postgres: postgres postgres [local] idle
- postgres 26920 21999 0 12:34 pts/2 00:00:00 ps -ef
- postgres 26921 21999 0 12:34 pts/2 00:00:00 grep --color=auto postgres
- [postgres@test1 /opt/PostgreSQL/10/data]$

END
-- 2021-08-23 add
使用了事务槽,或者复制槽后,在PG的文件夹中会显示。
- [postgres@test /opt/PostgreSQL/10/data]$ls
- base pg_commit_ts pg_logical pg_serial pg_subtrans pg_wal postmaster.opts
- current_logfiles pg_dynshmem pg_multixact pg_snapshots pg_tblspc pg_xact postmaster.pid
- global pg_hba.conf pg_notify pg_stat pg_twophase postgresql.auto.conf
- log pg_ident.conf pg_replslot pg_stat_tmp PG_VERSION postgresql.conf
-
- postgres@test /opt/PostgreSQL/10/data/pg_replslot]$cd node_a_slot/
- [postgres@test /opt/PostgreSQL/10/data/pg_replslot/node_a_slot]$ls
- state
- [postgres@test /opt/PostgreSQL/10/data/pg_replslot/node_a_slot]$cd state
- -bash: cd: state: Not a directory
-
- [postgres@test /opt/PostgreSQL/10/data/pg_replslot/node_a_slot]$more state
- ¡Áׇâ
- [postgres@test /opt/PostgreSQL/10/data/pg_replslot/node_a_slot]$strings state
- node_a_slot
- [postgres@test /opt/PostgreSQL/10/data/pg_replslot/node_a_slot]$

END
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。