Today I saw a post on the /r/saltstack subreddit asking for help setting up replication for PostgreSQL 12 using SaltStack. I recently had to solve the same problem for our OpenStack clusters at my employer, iland Cloud, so I thought it may be helpful to share what we built.
I'll provide all of the necessary states to set up PostgreSQL, as well as the states used to configure replication. All of these states were written for deployment on Ubuntu 18.04 running SaltStack 2019.2.3.
This blog post assumes two database servers per cluster:
openstack-db01.res01.example.com
- Master during normal operationsopenstack-db02.res01.example.com
- Secondary during normal operations
In this post we will be using streaming-based replication and continuous archiving.
We also will make use of custom grains, pillar data, and the salt mine, which are detailed below.
Custom Grains
These states make use of a few custom grains that are set based on the server's hostname. For example, server with a hostname of openstack-db02.res01.example.com
would have the following grains set:
dc: res01
nodetype: openstack-db
nodenumber: 2
In these examples, the server with nodenumber
set to 1
indicates the server that is normally serving as master.
You can adjust the variables in the states to suit your environment, but just be aware what these grains are referencing when used in the states in this post.
Pillar Variables
We also need to set the pillar data that is used by the various states.
# Our openstack-db servers use the PGDG apt repos to install postgresql
openstack_postgresql_version: 12+213.pgdg18.04+1
openstack:
postgres:
repl_username: ha_replication
repl_password: <set this, use the gpg renderer>
Salt Mine
Our Salt minions have Salt Mine enabled to allow other minions to fetch network.ip_addrs
from one another. This feature is used on the PostgreSQL states to automatically detect the ip addresses of the other database servers in the same cluster.
States Needed for the PostgreSQL Server
We use the PostgreSQL 12 packages provided by postgresql.org. They also maintain a FAQ.
The /srv/salt/openstack-db/packages.sls
file:
postgresql-gpg-key:
cmd.run:
- name: wget -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -
- unless: /usr/bin/test "$(apt-key export ACCC4CF8 2> /dev/null | sha1sum -| awk '{print $1}')" = "403374e8f22266f67fe14b79b8257491dce75af7"
/etc/apt/sources.list.d/postgresql.list:
file.managed:
- source: salt://common/etc/apt/sources.list.d/postgresql.list
- template: jinja
- require:
- cmd: postgresql-gpg-key
cmd.run:
- name: apt-get update -o Dir::Etc::sourcelist="sources.list.d/postgresql.list" -o Dir::Etc::sourceparts="-" -o APT::Get::List-Cleanup="0"
- onchanges:
- file: /etc/apt/sources.list.d/postgresql.list
postgresql:
pkg.installed:
- version: {{ salt.pillar.get('openstack_postgresql_version') }}
- hold: True
The salt://common/etc/apt/sources.list.d/postgresql.list
file referenced above contains this:
# no pinning necessary:
# https://wiki.postgresql.org/wiki/Apt/FAQ#I_want_only_specific_packages_from_this_repository
deb http://apt.postgresql.org/pub/repos/apt/ {{ grains.get('oscodename') }}-pgdg main
The /srv/salt/openstack-db/files.sls
file:
# The first line sets a "target" string for the mine.get command that follows it,
# which fetchs a list of the primary ip addresses for any servers in this datacenter
# where nodetype is set to openstack-db
{%- set _openstack_db_tgt = 'G@nodetype:openstack-db and G@dc:%s'|format(grains.get('dc')) -%}
{%- set openstack_db_ips = salt['mine.get'](_openstack_db_tgt, 'network.ip_addrs', tgt_type='compound').values() | map('first') -%}
{%- set pg = salt.pillar.get('openstack:postgres') -%}
{%- set pg_major_version = salt.pillar.get('openstack_postgresql_version').split('+')[0] -%}
/var/lib/postgresql/pg{{ pg_major_version }}_wal:
file.directory:
- user: postgres
- group: postgres
- mode: 0750
- require:
- pkg: postgresql
- listen_in:
- service: service-postgresql
/etc/postgresql/12/main/postgresql.conf:
file.managed:
- source: salt://openstack-db/etc/postgresql/12/main/postgresql.conf
- require:
- pkg: postgresql
- listen_in:
- service: service-postgresql
/etc/postgresql/12/main/pg_hba.conf:
file.managed:
- source: salt://openstack-db/etc/postgresql/12/main/pg_hba.conf
- template: jinja
- require:
- pkg: postgresql
- listen_in:
- service: service-postgresql
/var/lib/postgresql/.pgpass:
file.managed:
- contents:
{% for openstack_db_ip in openstack_db_ips %}
- {{ openstack_db_ip }}:5432:*:{{ pg.repl_username }}:{{ pg.repl_password }}
{% endfor %}
- user: postgres
- group: postgres
- mode: 600
You'll need to replace the various source: salt://
locations in the above with the paths to your files.
The /var/lib/postgresql/.pgpass
state above is critical to states that handle configuring replication as that file is used by the pg_basebackup
command to authenticate the secondary to the master.
Ensure PostgreSQL's pg_hba.conf
is configured to allow replication from your database server's ip addresses. The following is a snippet from the pg_hba.conf
file referenced above:
local replication postgres peer
local replication all peer
host replication all 127.0.0.1/32 md5
host replication all ::1/128 md5
host replication {{ pg.repl_username }} 127.0.0.1/32 md5
host replication {{ pg.repl_username }} <db01-ip>/32 md5
host replication {{ pg.repl_username }} <db02-ip>/32 md5
Also, the postgresql.conf
file should have the following options set:
archive_mode = on
archive_command = 'test ! -f /var/lib/postgresql/pg12_wal/%f && cp %p /var/lib/postgresql/pg12_wal/%f'
The /srv/salt/openstack-db/services.sls
file:
service-postgresql:
service.running:
- name: postgresql
- reload: True
- watch:
- pkg: postgresql
The /srv/salt/openstack-db/setup/create-users-and-grant-privileges.sls
file:
# this should only be run when initially setting up the primary db in the cluster
{% set pg = salt.pillar.get('openstack:postgres') %}
{% if grains.get('nodenumber') == 1 %}
postgresql-replication-user:
postgres_user.present:
- name: {{ pg.repl_username }}
- password: {{ pg.repl_password }}
- login: True
- replication: True
{% endif %}
The /srv/salt/openstack-db/setup/configure-secondary-for-replication.sls
file contains all the states necessary to configure a secondary server for replication:
# This sls file sets up a secondary postgresql server to replicate from the other.
# Require a pillar variable `yes_destroy_pg_datadir` be set to True on command line
# to override this to avoid accidentally deleting data on production database servers.
{%- set confirm_destroy = pillar.get('yes_destroy_pg_datadir', False) | to_bool -%}
# If we are being executed on openstack-db01 then set master to openstack-db02,
# otherwise default setting master to openstack-db01
{%- set other_nodenumber = 2 if grains.get('nodenumber') == 1 else 1 -%}
{%- set _db_master = 'G@nodetype:openstack-db and G@nodenumber:%s and G@dc:%s'|format(other_nodenumber, grains.get('dc')) -%}
{%- set db_master = salt['mine.get'](_db_master, 'network.ip_addrs', tgt_type='compound').values() | map('first') | list | first -%}
{%- set pg_major_version = salt.pillar.get('openstack_postgresql_version').split('+')[0] -%}
{%- set pg = salt.pillar.get('openstack:postgres') -%}
{% if grains.get('nodetype') == 'openstack-db' and confirm_destroy %}
include:
- openstack-db.packages
- openstack-db.files
service-postgresql:
service.dead:
- name: postgresql@{{ pg_major_version }}-main
- require:
- pkg: postgresql
mask-postgresql:
service.masked:
- name: postgresql@{{ pg_major_version }}-main
clear-out-postgres-data-files:
cmd.run:
- name: rm -rf /var/lib/postgresql/{{ pg_major_version }}/main/*
- runas: postgres
- require:
- service: service-postgresql
- service: mask-postgresql
clear-out-pgwal-files:
cmd.run:
- name: rm -rf /var/lib/postgresql/pg{{ pg_major_version }}_wal/*
- runas: postgres
- require:
- service: service-postgresql
- service: mask-postgresql
stream-basebackup-from-master:
cmd.run:
- name: |
/usr/bin/pg_basebackup \
-h {{ db_master }} \
-p 5432 \
-U {{ pg.repl_username }} \
-D /var/lib/postgresql/{{ pg_major_version }}/main/ \
--format=plain \
--wal-method=stream \
--write-recovery-conf \
--no-password
- runas: postgres
- require:
- cmd: clear-out-postgres-data-files
- cmd: clear-out-pgwal-files
- file: /var/lib/postgresql/.pgpass
unmask-postgresql:
service.unmasked:
- name: postgresql@{{ pg_major_version }}-main
{% endif %}
Configure Targetting in top.sls
Edit your /srv/salt/top.sls
file to include the packages.sls
, files.sls
, and services.sls
files from the /srv/salt/openstack-db
directory for all servers whose hostname begins with openstack-db
:
'openstack-db*':
- openstack-db.packages
- openstack-db.files
- openstack-db.services
Highstating the Servers
We are now ready to begin setting up the openstack-db PostgreSQL servers.
First, configure the master database server using the states above:
salt openstack-db01.res01.example.com state.highstate
salt openstack-db01.res01.example.com service.restart postgresql
salt openstack-db01.res01.example.com state.apply openstack-db.setup.create-users-and-grant-privileges
Next, configure the secondary database server:
salt openstack-db02.res01.example.com state.highstate
salt openstack-db02.res01.example.com service.restart postgresql
Finally, configure the secondary for replication:
Note: On the next line change False
to True
at the end of the line in the pillar data. This is a safeguard so that the command is only run when you really want to destroy the data directory. Be sure to include the extra whitespace at the beginning of the command to prevent it from being saved to ~/.bash_history
.
# change False to True on the pillar line to confirm you want to destroy the datadir
salt openstack-db02.res01.example.com state.apply \
openstack-db.setup.configure-secondary-for-replication \
pillar='{"yes_destroy_pg_datadir": "False"}'
salt openstack-db02.res01.example.com service.start postgresql
To verify that replication is working run the following command on the salt-master. You should see t
as the result of the following command:
salt openstack-db02.res01.example.com cmd.run 'psql -c "select pg_is_in_recovery();"' runas=postgres
The above command should result in output similar to below and would indicate a successful replication configuration on openstack-db02
:
openstack-db02.res01.example.com:
pg_is_in_recovery
-------------------
t
(1 row)
This completes the PostgreSQL installation and replication configuration.
OpenStack DB Replication and Failover
Failing Over: When the Primary Fails or is Shut Down
These same states can be used to failover to the secondary in the event of a failure on the master.
Before failing over to a secondary, ensure there is no database traffic being sent to the original master, which would be openstack-db01
in this case.
To failover to the secondary (openstack-db02
in our case), simply use salt to promote it to become master:
salt openstack-db02.res01.example.com cmd.run "pg_ctlcluster 12 main promote"
Next, verify that openstack-db02.res01.example.com
is no longer in recovery mode:
salt openstack-db02.res01.example.com cmd.run 'psql -c "select pg_is_in_recovery();"' runas=postgres
The above command should now return f
for pg_is_in_recovery
, which indicates it is no longer recovering/streaming from openstack-db01.res01.example.com
.
openstack-db02.res01.example.com:
pg_is_in_recovery
-------------------
f
(1 row)
We have now failed over to openstack-db02.res01.example.com
and it is acting as master. Ensure database traffic is now being sent to this new master.
Failing Back: Promoting the Original Primary Back to Master
Once the original primary openstack-db server (openstack-db01
) is back online then we can fail back to it. Make sure the server has been fully highstated first:
salt openstack-db01.res01.example.com state.highstate
The next step will DESTROY the database directory on the original primary so be careful and follow the directions exactly.
# change False to True on the pillar line to confirm you want to destroy the datadir
salt openstack-db01.res01.example.com state.apply \
openstack-db.setup.configure-secondary-for-replication \
pillar='{"yes_destroy_pg_datadir": "False"}'
salt openstack-db01.res01.example.com service.start postgresql
Continue running the following two commands until the lsn
fields match on both the master (openstack-db02
) and secondary (openstack-db01
):
salt openstack-db02.res01.example.com cmd.run \
"psql -c 'select state, sent_lsn, write_lsn, flush_lsn, replay_lsn from pg_stat_replication;'" runas=postgres
salt openstack-db01.res01.example.com cmd.run \
"psql -c 'select status, received_lsn, latest_end_lsn from pg_stat_wal_receiver;'" runas=postgres
Replication is caught up when the lsn
fields match. Next we run a final CHECKPOINT
on openstack-db02.res01.example.com
and then immediately promote openstack-db01.res01.example.com
back to master:
salt openstack-db02.res01.example.com cmd.run "psql -c 'CHECKPOINT;'" runas=postgres
salt openstack-db01.res01.example.com cmd.run "pg_ctlcluster 12 main promote"
Ensure database traffic is now being sent to the original master.
Reconfiguring openstack-db02 as Secondary
We now need to return the openstack-db02.res01.example.com
server to the secondary configuration:
# change False to True on the pillar line to confirm you want to destroy the datadir
salt openstack-db02.res01.example.com state.apply \
openstack-db.setup.configure-secondary-for-replication \
pillar='{"yes_destroy_pg_datadir": "False"}'
salt openstack-db02.res01.example.com service.start postgresql
Check that replication is working, run this on openstack-db02
, you should see t
as the result of the following command:
salt openstack-db02.res01.example.com cmd.run 'psql -c "select pg_is_in_recovery();"' runas=postgres
The above command should result in output similar to below and would indicate a successful replication configuration on openstack-db02
:
openstack-db02.res01.example.com:
pg_is_in_recovery
-------------------
t
(1 row)
This completes the replication configuration.
Comments