赞
踩
repmgr用于管理 PostgreSQL 服务器集群内的复制和故障转移,类似dg-broker,可以实现swichover和failover。
repmgr架构图如下:
每个节点都可通过 repmgrd 守护进程来监控节点数据库状态;每个节点元数据表可独立维护,这些元数据表将记录集群所有节点的信息。
在发生failover时,备节点尝试多次连接主节点失败后(尝试次数和间隔可通过repmgr.conf配置),repmgrd会在所有备节点中选举一个作为新主节点(repmgr 选举新主节点按照以下顺序选举:LSN > Priority > Node_ID),其他备节点会follow新主节点,形成新的集群。
repmgr主要提供了repmgr和repmgrd两个工具:
repmgr 是一个执行管理任务的命令行工具,方便进行 PostgreSQL 服务器集群的管理,主要功能有:
repmgrd作为守护进程,监视集群中节点的状态,主要功能如下:
为了有效地管理复制集群,repmgr 需要将集群中节点的相关信息存储在 repmgr 专用数据库表中。包含以下对象:
表:
repmgr.events: records events of interest
repmgr.nodes: 复制集群中每个节点的连接和状态信息
repmgr.monitoring_history: repmgrd 写入的历史备用监控信息
视图:
repmgr.show_nodes: 基于 repmgr.nodes 表,另外显示服务器上游节点的名称
repmgr.replication_status: 当启用 repmgrd 的监控时,显示每个 standby 的监控状态
注:
repmgr 元数据信息只能存储在被此repmgr管理的集群中,且需要可以访问数据库和执行更改的管理用户,由于repmgr的初始安装需要超级用户权限,所以此管理用户需要被赋予superuser权限。
主机名 | IP | node_id | node_name | priority | 备注 |
pr-obsrv01 | 10.10.237.11 | 1 | obsrv01 | 默认100 | pg、repmgr、repmgrd |
pr-obsrv02 | 10.10.237.12 | 2 | obsrv02 | 默认100 | pg、repmgr、repmgrd |
pr-obsrv03 | 10.10.237.13 | 3 | obsrv03 | 默认100 | pg、repmgr、repmgrd |
repmgr - Replication Manager for PostgreSQL clusters
Release notes: repmgr - Replication Manager for PostgreSQL clusters
可选择最新版本(最新版本5.4.1支持pg9.4~pg15)
- systemctl stop firewalld
- systemctl disable firewalld
-
- vi /etc/selinux/config
- SELINUX=disabled
-
- reboot
- mkdir /pg
- useradd -d /pg pg
- passwd pg
- cp /etc/skel/* /pg
- chown -R pg.pg /pg
- su - pg
- vi .bash_profile
- export PGHOME=/pg
- export PGDATA=$PGHOME/data
- export PATH=$PATH:$PGHOME/bin
- export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PGHOME/lib
cat /etc/hosts
10.10.237.11 pr-obsrv01
10.10.237.12 pr-obsrv02
10.10.237.13 pr-obsrv03
- su - pg
- ssh-keygen -t rsa
-
- ssh-copy-id -i /pg/.ssh/id_rsa pr-obsrv01
- ssh-copy-id -i /pg/.ssh/id_rsa pr-obsrv02
- ssh-copy-id -i /pg/.ssh/id_rsa pr-obsrv03
-
- 节点1执行:
- ssh pr-obsrv02
- ssh pr-obsrv03
- 节点2执行:
- ssh pr-obsrv01
- ssh pr-obsrv03
- 节点3执行:
- ssh pr-obsrv01
- ssh pr-obsrv02
- su - pg
- mkdir data
- mkdir log
- mkdir archive_wals
- yum -y install libselinux-devel
- yum -y install libxml2-devel
- yum -y install libxslt-devel
- yum -y install openssl-devel
- yum -y install pam-devel
- yum -y install readline-devel
- su - pg
- tar -xvf repmgr-5.4.1.tar.gz
- mv repmgr-5.4.1 repmgr
- cd repmgr
- ./configure
- make install
- repmgr -V
安装后会在/pg/bin下生成repmgr和repmgrd两个可执行文件;在/pg/share/postgresql/extension/下生成repmgr文件;在/pg/lib/postgresql下生成repmgr.so
注意:不要将repmgr.conf放在PGDATA目录下
- cat repmgr.conf
- node_id=1
- node_name=obsrv01
- conninfo='host=10.10.237.11 port=5434 user=repmgr password=repmgr dbname=repmgr connect_timeout=5'
- data_directory='/pg/data'
- pg_bindir='/pg/bin'
-
- location='default'
- failover='automatic'
- promote_command='/pg/bin/repmgr standby promote -f /pg/repmgr.conf --log-to-file'
- follow_command='/pg/bin/repmgr standby follow -f /pg/repmgr.conf --log-to-file --upstream-node-id=%n'
- log_file='/pg/log/repmgrd.log'
- monitoring_history=true
- monitor_interval_secs=5
- reconnect_attempts=6
- reconnect_interval=5
注:其他两个节点除了 node_id、node_name和conninfo不同,其他配置均相同
- su - pg
- tar -xvf postgresql-13.11.tar.gz
- cd postgresql-13.11
- ./configure --prefix=/pg
- make -j 8
- make install
initdb -D /pg/data
- listen_addresses = '*' # what IP address(es) to listen on;
- port = 5434 # (change requires restart)
- max_connections = 100 # (change requires restart)
- shared_buffers = 128MB # min 128kB
- wal_level = replica # minimal, replica, or logical
- max_wal_size = 1GB
- min_wal_size = 80MB
- archive_mode = on # enables archiving; off, on, or always
- archive_command = 'test ! -f /pg/archive_wals/%f && cp %p /pg/archive_wals/%f' # command to use to archive a logfile segment
- max_wal_senders = 10 # max number of walsender processes
- hot_standby = on # "off" disallows queries during recovery
- logging_collector = on # Enable capturing of stderr and csvlog
- log_directory = '/pg/log' # directory where log files are written,
- log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern,
- log_file_mode = 0600 # creation mode for log files,
- log_rotation_age = 1d # Automatic rotation of logfiles will
- log_rotation_size = 10MB # Automatic rotation of logfiles will
- log_timezone = 'Asia/Shanghai'
- shared_preload_libraries = 'repmgr' # (change requires restart)
- wal_log_hints = 'on' # pg_rewind需要开启wal_log_hints,否则无法修复failover的旧主节点只能重建
- local replication repmgr trust
- host replication repmgr 127.0.0.1/32 trust
- host replication repmgr 10.10.237.11/25 trust
- host replication repmgr 10.10.237.12/25 trust
- host replication repmgr 10.10.237.13/25 trust
- local repmgr repmgr trust
- host repmgr repmgr 127.0.0.1/32 trust
- host repmgr repmgr 10.10.237.11/25 trust
- host repmgr repmgr 10.10.237.12/25 trust
- host repmgr repmgr 10.10.237.13/25 trust
- su - pg
- pg_ctl start
- psql -Upostgres -p5434
- create user repmgr superuser password 'repmgr';
- ALTER USER repmgr SET search_path TO repmgr, "$user", public;
- create database repmgr owner repmgr;
- repmgr -f /pg/repmgr.conf primary register
- repmgr -f /pg/repmgr.conf cluster show
或者在元数据库中查询
- psql -Urepmgr -drepmgr -p5434
- select * from nodes;
首先--dry-run检查extension和必要参数
repmgr -h 10.10.237.11 -U repmgr -p 5434 -d repmgr -f /pg/repmgr.conf standby clone --dry-run
没问题再进行clone
repmgr -h 10.10.237.11 -U repmgr -p 5434 -d repmgr -f /pg/repmgr.conf standby clone
最后HINT提示需要起postgres服务个register从节点
- pg_ctl start
- repmgr -f /pg/repmgr.conf standby register
- repmgr -f /pg/repmgr.conf cluster show
- su - pg
- touch /pg/log/repmgrd.log
- repmgrd -f /pg/repmgr.conf start
某从节点执行
repmgr -f /pg/repmgr.conf standby switchover --siblings-follow
将主节点的postgres进程kill掉看能否实现自动故障转移
恢复原主节点可以使用以下命令
repmgr -f /pg/repmgr.conf node rejoin -d 'host=10.10.237.11 dbname=repmgr user=repmgr password=repmgr port=5434' --force-rewind
参考文档:
Copyright © 2003-2013 www.wpsshop.cn 版权所有,并保留所有权利。