Featured image of post Migrating databases to CloudnativePG

Migrating databases to CloudnativePG

A lot of my self-hosted services use PostgreSQL as their database. As the number of services grows, management and backup of those databases becomes a chore. As written in a previous post, I have set up pgAdmin to manage all PostgreSQL databases in a single place, and a Kubernetes Job to back up all databases at once.

However, that setup did not solve all issues. Namely, setting up a new DB was still a chore involving the following steps:

  1. Set up pvc and deployment for the new PostgreSQL database
  2. Add the new database to pgAdmin’s configmap
  3. Add it to the backup job’s configmap/secret

The configuration of backup job was highly manual. Each time a new DB was added, I risked misconfiguring or forgetting to add it to the backup job.

CloudnativePG

CloudnativePG is a Kubernetes operator for PostgreSQL. It makes it easy to set up and manage PostgreSQL clusters in Kubernetes, and it has built-in support for HA with automatic failover.

I thought if I migrate all of my DBs into it, I could use its built-in backup feature to back up all databases in a consistent way.

So my initial idea was to set up separate DB for each existing service in my cluster, and then migrate the data from the old DB to the new one. would have been the “production-ready” way to do things.

Looking at my resources in the cluster, however, I reckoned that running multiple CloudnativePG clusters would be too resource-intensive and definitely an overkill for my home lab. So instead, I decided to set up a single CloudnativePG cluster with multiple databases inside.

Environment

  1. Kubernetes cluster: v1.33.3+k3s1
  2. CloudnativePG: 1.27.0, installed via Helm chart 0.26.0

Installing CloudnativePG

CloudnativePG’s helm chart had a sensible set of defaults, so this was all the configs I needed:

cnpg.yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: cloudnative-pg
  namespace: argocd
spec:
  destination:
    namespace: postgres
    server: https://kubernetes.default.svc
  project: default
  source:
    chart: cloudnative-pg
    repoURL: https://cloudnative-pg.github.io/charts
    targetRevision: 0.26.0
    helm:
      values: |-
        config:
          clusterWide: false

        monitoring:
          podMonitorEnabled: true

        additionalEnv:
        - name: INHERITED_ANNOTATIONS
          value: prometheus.io/*        
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
    - ServerSideApply=true

I set clusterWide to false to restrict CloudnativePG to the postgres namespace only, since I only plan to deploy my DBs in a single namespace.

Other options are for monitoring with Prometheus + Grafana.

Creating a PostgreSQL cluster

Then I proceeded to create a PostgreSQL cluster with the following manifest:

central.yml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: central
  namespace: postgres
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9187"
    prometheus.io/path: "/metrics"
spec:
  instances: 3
  storage:
    size: 10Gi
  resources:
    requests:
      cpu: "4"
      memory: "8Gi"
    limits:
      cpu: "4"
      memory: "8Gi"
  postgresql:
    parameters:
      max_connections: "400"
      shared_buffers: "2GB"
  managed:
    services:
      additional:
      - selectorType: rw
        serviceTemplate:
          metadata:
            name: central-rw-lb
          spec:
            type: LoadBalancer
            loadBalancerIP: 10.0.69.234
            ports:
            - name: postgres
              protocol: TCP
              port: 5432
              targetPort: 5432

This creates a PostgreSQL cluster named central with 3 instances, one primary and two replicas. Each instance has 10Gi of storage, 4 CPU and 8Gi of memory allocated.

I set max_connections to 400 to allow enough connections for all my services, and shared_buffers to 2Gi following the recommendation on the CNPG docs.

A reasonable starting value for shared_buffers is 25% of the memory in your system. For example: if your shared_buffers is 256 MB, then the recommended value for your container memory size is 1 GB, which means that within a pod all the containers will have a total of 1 GB memory that Kubernetes will always preserve, enabling our containers to work as expected. For more details, please refer to the “Resource Consumption” section in the PostgreSQL documentation.

So far, the suggested 25% of memory rule is working well for me.

Data migration

It’s time for data migration.

Exporting is easy. I just used pg_dump to dump the data from the old DB, like:

1
k exec sjwt-db-deployment-684469fc58-drlds -- pg_dump -U exadmin sjwt > ~/tmp/sjwt.sql

Then, for each database, I created a new database and user in the new CloudnativePG cluster:

1
2
3
CREATE DATABASE sjwt;
CREATE USER exadmin WITH ENCRYPTED PASSWORD 'mypassword';
GRANT ALL PRIVILEGES ON DATABASE sjwt TO exadmin;

Finally, import the data with psql:

1
kubectl exec -i central-1 -- psql -U postgres sjwt < ~/tmp/sjwt.sql
tip

I have to take some time to learn pg_dump -Fc and pg_restore to make the dump and restore process.
With the small databases I have currently, the plain text format works fine, but for larger databases, the compression and selective restore features of the custom format seems very useful.

Set up pgAdmin

For pgAdmin, I’m still using the same configuration pattern as before. Now I only need one database to connect to.

pg-servers.yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
apiVersion: v1
kind: ConfigMap
metadata:
  name: pg-servers
  namespace: admin
data:
  dbs.json: |
    {
      "Servers": {
        "1": {
          "Name": "central",
          "Group": "Servers",
          "Host": "central-rw.postgres.svc.cluster.local",
          "Port": 5432,
          "MaintenanceDB": "postgres",
          "Username": "admin",
          "UseSSHTunnel": 0,
          "TunnelPort": "22",
          "TunnelAuthentication": 0,
          "TunnelKeepAlive": 0,
          "KerberosAuthentication": false,
          "ConnectionParameters": {
            "sslmode": "prefer",
            "connect_timeout": 10
          },
          "Tags": []
        }
      }
    }    

Backups

CloudnativePG has built-in support for backups, which is documented here. It seems very straightforward to set up with some kind of object storage like S3, GCS, or Azure Blob Storage.

For me, since I don’t have a object storage set up just yet, I resorted to setting up a custom k8s cronjob for that.

info

Since I already have a Ceph cluster, my plan here is to set up Ceph RGW as an S3-compatible object storage service.

The following secret holds the connection information to the database. It uses the central-ro service created by CloudnativePG to connect to a read-only replica.

secret.yaml
1
2
3
4
5
6
7
8
apiVersion: v1
kind: Secret
metadata:
  name: central-admin
stringData:
  POSTGRES_USER: <redacted>
  POSTGRES_PASSWORD: <redacted>
  POSTGRES_HOST: central-ro

The cronjob itself looks like this:

cronjob.yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
apiVersion: batch/v1
kind: CronJob
metadata:
  name: central-dumpall
spec:
  jobTemplate:
    metadata:
      name: central-dumpall
    spec:
      template:
        metadata:
          name: central-dumpall
        spec:
          containers:
          - image: postgres:17
            name: central-dumpall
            envFrom:
            - secretRef:
                name: central-admin
            command: ["/bin/bash", "-c"]
            args:
            - |
              POSTFIX=central_$(date +%Y%m%d%H%M%S)
              TARNAME=pgdump_${POSTFIX}
              TMPDIR=/tmp/pgdump
              OUTDIR=${TMPDIR}/${TARNAME}

              mkdir -p "$OUTDIR"

              export PGPASSWORD=$POSTGRES_PASSWORD
              for db in $(psql -h $POSTGRES_HOST -U $POSTGRES_USER -d postgres -t -c "SELECT datname FROM pg_database WHERE datistemplate = false;"); do
                  if [[ -n "$db" ]]; then
                      echo "Backing up $db"
                      pg_dump -h $POSTGRES_HOST -U $POSTGRES_USER "$db" > "${OUTDIR}/${db}_${POSTFIX}.sql"
                  fi
              done

              pg_dumpall -h $POSTGRES_HOST -U $POSTGRES_USER --globals-only > "${OUTDIR}/globals_${POSTFIX}.sql"
              pg_dumpall -h $POSTGRES_HOST -U $POSTGRES_USER > "${OUTDIR}/full_${POSTFIX}.sql"

              tar czf /backup/${TARNAME}.tar.gz -C "$TMPDIR" "$TARNAME"

              echo "Done!"              
            volumeMounts:
            - name: db-bck
              mountPath: /backup
          restartPolicy: OnFailure
          volumes:
          - name: db-bck
            persistentVolumeClaim:
              claimName: db-bck
  schedule: 0 2 * * *

This cronjob runs every day at 2am, and does the following:

  1. Scan for all databases in the cluster
  2. Dump each database into a separate SQL file
  3. Dump global objects (roles, tablespaces, etc.) into a separate SQL file
  4. Create another SQL file with all databases dumped in one file
  5. Tar and compress all SQL files into a single archive
  6. Store the archive in a PVC

The advantage of this over the previous setup is that it automatically scans for all databases, so I don’t have to manually add new databases to the backup job. It also stores compressed dump files in a single archive, making it easier to manage.

tip

Again, the pg_dump -Fc and pg_restore combo seems useful here.
Or if I properly set up Ceph RGW, maybe this cronjob can be ditched altogether.

Connection

The connection string for the new database looks like this:

1
postgres://<user>:<password>@central-rw.postgres.svc.cluster.local/<database>

from inside the cluster.

Monitoring

For monitoring, I just needed to follow the instructions on offical docs to set up a dashboard in Grafana. The rest was already taken care of by the operator.

Grafana dashboard

Conclusion

I have left this setup running for a few weeks now, and so far, DB connections seems stable, and backups are done successfully every day.

There are things I look forward to improving in the future:

  1. Set up Ceph RGW for CNPG’s built-in backup feature
  2. Figure out connection pooling with PgBouncer - I’ve tried it once, but couldn’t resolve a permission issue.
  3. Set up separate clusters for larger production databases - only after the backup is done properly
Built with Hugo
Theme Stack designed by Jimmy