【PostgreSQL-patroni维护命令】

发布于:2024-09-18 ⋅ 阅读:(82) ⋅ 点赞:(0)

手册
https://patroni.readthedocs.io/en/latest/index.html

patroni运维

1.列出节点信息

patronictl -c /etc/patroni.yml list

2.重做备库

reinit先是移除了整个data目录。然后选择正确的节点进行备份恢复。

patronictl -c /etc/patroni.yml reinit patnori-test

# patronictl -c /etc/patroni.yml reinit pgsql
+ Cluster: pgsql (6972099274779350082)+------+---------+----+-----------+
|   Member    |        Host         |  Role  |  State  | TL | Lag in MB |
+-------------+---------------------+--------+---------+----+-----------+
| pgsql_node1 | 192.168.22.128:5432 |        | running | 3  |      0    |
| pgsql_node2 | 192.168.22.129:5432 | Leader | running | 3  |           |
| pgsql_node3 | 192.168.22.130:5432 |        | running | 3  |      0    |
+-------------+---------------------+--------+---------+----+-----------+
Select the following node names to add: pgsql_node3
 Are you sure you want to reinitialize members pgsql_node3?[y/N]: y
 Success: for member pgsql_node3 Perform initialization

参数查看参数

patronictl -c /etc/patroni.yaml show-config

更改参数
patronictl -c /etc/patroni.yaml edit-config

reload加载参数(同时在三个节点生效)
patronictl -c /etc/patroni.yaml reload patnori-test

重启节点/关闭节点

加- -force 强制
加- -scheduled 2023-09-13T18:00-03:00 定时

1.仅重启当前节点(node1节点)
[root@pgtest1 ~]# patronictl restart pg_cluster node1

2.如果节点是 pending 状态的,才会执行重启操作
[root@pgtest1 ~]# patronictl restart pg_cluster --pending

3.重启所有成员
[root@pgtest1 ~]# patronictl restart pg_cluster

维护模式,脱离patroni的集群管理

patronictl pause

patronictl pause暂时将Patroni集群置于维护模式并禁用自动故障转移。

在某些情况下,Patroni需要暂时退出集群管理,同时仍然在DCS中保留集群状态。可能的用例是集群上不常见的活动,例如主要版本升级或损坏恢复。在这些活动期间,节点经常因为Patroni不知道的原因而启动和停止,有些节点甚至可以暂时提升,这违反了只运行一个主节点的假设。因此,Patroni需要能够与正在运行的集群“分离”,在Pacemaker中实现与维护模式相当的功能。

patronictl resume

patronictl resume将使Patroni集群退出维护模式,并重新启用自动故障转移。
自动拉起所有数据库

switchover切换

patronictl switchover

# Switchover
[postgres@pgtest1 ~]$ patronictl switchover
Master [pgtest1]: 
Candidate ['pgtest2', 'pgtest3'] []: pgtest2
When should the switchover take place (e.g. 2021-10-28T04:45 )  [now]: 
Current cluster topology
... ...
Are you sure you want to switchover cluster pg_cluster, demoting current master pgtest1? [y/N]: y
2021-10-28 03:45:35.91763 Successfully switched over to "pgtest2"
... ...
 数据库从 pgtest1 switchover 到 pgtest2
[root@pgtest1 ~]# curl -s http://192.168.58.10:8008/switchover -XPOST -d '{"leader":"pgtest1","candidate":"pgtest2"}'
Successfully switched over to "pgtest2"

failover切换

patronictl failover

# Failover
[postgres@pgtest1 ~]$ patronictl failover
Candidate ['pgtest1', 'pgtest3'] []: pgtest1
Current cluster topology
... ...
Are you sure you want to failover cluster pg_cluster, demoting current master pgtest2? [y/N]: y
2021-10-28 03:47:56.13486 Successfully failed over to "pgtest1"
... ...

执行到特定节点的 failover,在节点都正常的情况下,执行 failover 实际上和执行 Switchover 一样

curl -s http://192.168.58.10:8008/failover -XPOST -d ‘{“candidate”:“pgtest2”}’

[root@pgtest1 ~]# patronictl list
+---------+---------------+---------+---------+----+-----------+
| Member  | Host          | Role    | State   | TL | Lag in MB |
+ Cluster: pg_cluster (7025023477017500881) --+----+-----------+
| pgtest1 | 192.168.58.10 | Leader  | running |  5 |           |
| pgtest2 | 192.168.58.11 | Replica | running |  5 |         0 |
| pgtest3 | 192.168.58.12 | Replica | running |  5 |         0 |
+---------+---------------+---------+---------+----+-----------+
[root@pgtest1 ~]# curl -s http://192.168.58.10:8008/failover -XPOST -d '{"candidate":"pgtest2"}'

使用 patronictl 执行数据库查询操作

[postgres@pgtest1 ~]$ cat aa.sql
select * from test_1;
[postgres@pgtest1 ~]$ patronictl -c /enmo/app/patroni/pg_test01.yml query -f aa.sql --password
Password: 
id      create_time
1       2021-10-16 17:47:34
2       2021-10-16 17:55:06
[postgres@pgtest1 ~]$ patronictl -c /enmo/app/patroni/pg_test01.yml query -c "select * from test_1;" --password
Password: 
id      create_time
1       2021-10-16 17:47:34
2       2021-10-16 17:55:06

获取主节点dsn信息

$ ./patronictl -c postgres.yml dsn pgsql

host=192.168.1.143 port=5432

ETCD

1.查看etcd节点
etcdctl endpoint status --cluster -w table

 +---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|         ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://192.168.58.11:2379 | 3f414532c235ce16 |   3.5.1 |   20 kB |     false |      false |         4 |         16 |                 16 |        |
| http://192.168.58.12:2379 | 41cf8a739c2e9b50 |   3.5.1 |   20 kB |     false |      false |         4 |         16 |                 16 |        |
| http://192.168.58.10:2379 | caef4208a95efee8 |   3.5.1 |   25 kB |      true |      false |         4 |         16 |                 16 |        |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

2.转移leader节点

etcdctl move-leader 3f414532c235ce16

3.保存数据快照(一个节点执行)
etcdctl snapshot save etcd_bak.db

4.查看快照信息
etcdctl snapshot status etcd_bak.db -w table

[root@pgtest1 ~]# etcdctl snapshot status etcd_bak.db -w table 
+----------+----------+------------+------------+
|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 230fea56 |        0 |          8 |      25 kB |
+----------+----------+------------+------------+

Zookeeper

1.查看zookeeper状态

./zkServer.sh status

2.启动zookeeper

./zkServer.sh start

3.停止zookeeper

./zkServer.sh stop

4.连接到zookeeper服务

./zkCli.sh -server localhost:2181

ls(查看当前节点数据),
ls2(查看当前节点数据并能看到更新次数等数据)
create(创建一个节点)
get(得到一个节点,包含数据和更新次数等数据)
set(修改节点)
delete(删除一个节点)

5.查看patroni key值

ls

ls /service/batman
查看相关key值

[leader, optime, failover, members, initialize, history, config, sync]
 batman是patroni配置文件中scope的名字。

leader(主节点的名字)

get /service/batman/leader
leader记录主节点的名字,是临时节点,当session时间超过ttl后未响应,zookeeper就会删除该节点。

zk: localhost:2181(CONNECTED) 33] get /service/batman/leader
 postgresql1

sync(同步复制的状态)

get /service/batman/sync
sync记录同步复制的状态,是持久节点不会因为session到期,删除该key值

[zk: localhost:2181(CONNECTED) 30] get /service/batman/sync  
{"leader":"postgresql0","sync_standby":"postgresql1"}

optime(主库最后一次lsn位置)

get /service/batman/optime/leader
optime/leader是主库最后一次操作后的lsn位置,是持久节点不会因为session到期,删除该key值。

[zk: localhost:2181(CONNECTED) 71] get /service/batman/optime/leader      
1342177280

failover(记录计划的切换任务)

get /service/batman/failover
failover是记录计划的切换任务,是持久节点不会因为session到期,删除该key值。

 [zk: localhost:2181(CONNECTED) 117] get /service/batman/failover
 {"leader":"postgresql1","scheduled_at":"2018-11-09T14:30:00+08:00"}

members(记录所有数据节点的连接信息和重要的状态信息)

get /service/batman/members/postgresql0
get /service/batman/members/postgresql

members是分别记录了所有数据节点的连接信息和重要的状态信息,是临时节点,当session时间超过ttl后未响应,zookeeper就会删除该节点。

[zk: localhost:2181(CONNECTED) 120] get /service/batman/members/postgresql0 
{"conn_url":"postgres://192.168.56.5:5432/postgres","api_url":"http://192.168.56.5:8008
 /patroni","timeline":58,"state":"running","role":"replica","xlog_location":1426063952}

initialize(记录数据库集群初始化的信息)

get /service/batman/initialize
initialize记录了数据库集群初始化的信息,是持久节点不会因为session到期,删除该key值。

[zk: localhost:2181(CONNECTED) 123] get /service/batman/initialize         
6618183861621602635

//这个id是控制文件信息里的 Database system identifier
[postgres@node2 ~]$ pg_controldata
pg_control version number:     1100     
Catalog version number:        201809051        
Database system identifier:    6618183861621602635   

history(记录的是集群中时间线变化的过程)

get /service/batman/history
history记录的是集群中时间线变化的过程,是持久节点不会因为session到期,删除该key值

[zk: localhost:2181(CONNECTED) 26] get /service/batman/history
 [[1,67109464,"no recovery target specified"],[2,83886232,"no recovery target 
specified"],[3,100663448,"no recovery target specified"],[4,218103960,"no recovery 
target specified"],[5,251658392,"no recovery target specified"],[6,268435608,"no 
recovery target specified"],[7,318767256,"no recovery target specified"],
 [8,335544472,"no recovery target specified"],[9,352321688,"no recovery target 
specified"],[10,369098904,"no recovery target specified"],[11,402653336,"no recovery 
target specified"],[12,419430552,"no recovery target specified"],[13,436207768,"no 
recovery target specified"],[14,452984984,"no recovery target specified"],
 [15,469762200,"no recovery target specified"],[16,486539416,"no recovery target 
specified"],[17,503316632,"no recovery target specified"],[18,536871064,"no recovery 
target specified"],[19,553648280,"no recovery target specified"],[20,570425496,"no 
recovery target specified"],[21,603979928,"no recovery target specified"],
 [22,620757144,"no recovery target specified"],[23,637534360,"no recovery target 
specified"],[24,654311576,"no recovery target specified"],[25,671088792,"no recovery 
target specified","2018-11-05T16:44:52+08:00"],[26,704643224,"no recovery target 
specified","2018-11-05T16:49:12+08:00"],[27,721420440,"no recovery target 
specified","2018-11-05T23:21:12+08:00"],[28,771752088,"no recovery target 
specified","2018-11-06T04:32:40+08:00"],[29,805306520,"no recovery target 
specified","2018-11-06T05:22:45+08:00"],[30,822083736,"no recovery target 

config(记录的是patroni配置文件的配置信息)

get /service/batman/config
config记录patroni配置文件的配置信息,是持久节点不会因为session到期,删除该key值

[zk: localhost:2181(CONNECTED) 28] get /service/batman/config 
{"retry_timeout": 10, "postgresql": {"use_slots": true, "use_pg_rewind": true, 
"parameters": {"hot_standby": "on", "wal_keep_segments": 8, "wal_level": "hot_standby", 
"archive_command": "mkdir -p ../wal_archive && test ! -f ../wal_archive/%f && cp %p 
../wal_archive/%f", "wal_log_hints": "on", "max_wal_senders": 10, "archive_timeout": 
2000, "archive_mode": "on", "max_replication_slots": 10, "max_connections": 300}, 
"recovery_conf": {"restore_command": "cp ../wal_archive/%f %p"}}, "synchronous_mode": 
true, "maximum_lag_on_failover": 1048576, "loop_wait": 10, "max_connection": 300, 
"archive_timeout": "2000s", "ttl": 30, "max_connections": 300}

6.zk清理快照

保留最近20个snap文件

./zkCleanup.sh -n 20

使用一些linux命令删除n天前的数据或日志

find /zookeeperData/version-2/ -name "snap*" -mtime +10 | xargs rm -f
find /zookeeperDataLog/version-2/ -name "log*" -mtime +10 | xargs rm -f
find /opt/apache-zookeeper-3.7.1-bin/logs -name "zookeeper.log.*" -mtime +10 | xargs rm –f

配置自动清理日志
从3.4.0开始,会自动清理日志了,所以这个通常不用配置。
配置autopurge.snapRetainCount和autopurge.purgeInterval参数。
保留的snapshop的数量,默认是3个,最小也是3。

autopurge.snapRetainCount=3
autopurge.purgeInterval=1

3.4.0之前的版本可以通过zookeeper的配置自行对snap进行管理。如下这三个参数分别表示一个小时清理一次,log的大小(单位是kb)和快照的数量。

autopurge.purgeInterval=1
preAllocSize=131072
snapCount=300000

7.通过下面命令查看zookeeper启动的各个参数,包括java路径等

./bin/zkServer.sh print-cmd

参考链接

1.https://www.modb.pro/db/73762
2.https://www.modb.pro/db/75268
3.https://www.modb.pro/topic/152353
升级
1.https://www.modb.pro/db/500077