Featured image of post Automated backups of Ceph RGW

Automated backups of Ceph RGW

I have set up Ceph RGW to provide object storage for my self-hosted applications. Naturally, I want a reliable backup strategy to ensure all that data is safe.

Although Ceph is designed for redundancy and fault tolerance, its primary purpose is HA, not backup. For instance, if I lose access to Ceph configurations and only have access to the raw data, recovering the RGW data can be very challenging.

So the end goal is to periodically back up all RGW data to a non-Ceph storage, such as an external ZFS pool. For now, since I haven’t set up such a pool, I’m just simulating the backup process by copying data to a CephFS directory.

note

If the bucket contains absurdly large amount of objects, backing up to a file system might introduce performance issues or even hit inode limits. In my case I just plan to use RGW for handful of services, so this approach should be good (at least for a while).

Environment

  1. Ceph: 19.2.3 Squid, installed via Proxmox APT repository
  2. Kubernetes: v1.33.4+k3s1

Backup cronjob

My favorite way to run scheduled tasks is still k8s cronJob.

I decided to use the AWS CLI tool for interacting with RGW, as RGW is compatible with the S3 API and AWS CLI provides a readily available official container image.

The goal is to have a fairly recent version of each RGW buckets in the backup storage. Since I’m not aiming for something like point-in-time recovery, I decided to just use aws s3 sync to perform incremental backups.

script-cm.yml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
apiVersion: v1
kind: ConfigMap
metadata:
  name: rgw-backup-script
  namespace: backup
data:
  backup.sh: |
    #!/bin/sh
    set -e

    echo "Starting Ceph RGW backup"
    BACKUP_DIR="/backup/"

    # sync all buckets incrementally
    aws s3 ls --endpoint-url ${RGW_ENDPOINT} | awk '{print $3}' | while read bucket; do
      if [ -n "$bucket" ]; then
        echo "Syncing bucket: ${bucket}"
        aws s3 sync --endpoint-url ${RGW_ENDPOINT} s3://${bucket} ${BACKUP_DIR}/${bucket} --delete
      fi
    done

    echo "Backup completed"
    du -sh ${BACKUP_DIR}    
cronjob.yml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
apiVersion: batch/v1
kind: CronJob
metadata:
  name: rgw-backup
  namespace: backup
spec:
  schedule: "0 4 * * *"
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: OnFailure
          containers:
          - name: rgw-backup
            image: amazon/aws-cli:latest
            command:
            - /bin/sh
            - /scripts/backup.sh
            envFrom:
            - secretRef:
                name: rgw-admin
            volumeMounts:
            - name: backup-script
              mountPath: /scripts
            - name: backup-storage
              mountPath: /backup
          volumes:
          - name: backup-script
            configMap:
              name: rgw-backup-script
              defaultMode: 0755
          - name: backup-storage
            persistentVolumeClaim:
              claimName: rgw-bck
secret.yml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
apiVersion: v1
kind: Secret
metadata:
  name: rgw-admin
  namespace: backup
type: Opaque
stringData:
  AWS_ACCESS_KEY_ID: <access-key-id>
  AWS_SECRET_ACCESS_KEY: <secret-access-key>
  RGW_ENDPOINT: https://rgw.i.junyi.me
volume.yml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
apiVersion: v1
kind: PersistentVolume
metadata:
  name: rgw-bck
  namespace: backup
spec:
  storageClassName: csi-cephfs-sc
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 1Gi
  csi:
    driver: cephfs.csi.ceph.com
    nodeStageSecretRef:
      name: csi-cephfs-secret
      namespace: ceph-csi-cephfs
    volumeAttributes:
      "fsName": "<fs-name>"
      "clusterID": "<cluster-id>"
      "staticVolume": "true"
      "rootPath": /backup/rgw
    volumeHandle: rgw-bck
  persistentVolumeReclaimPolicy: Retain
  volumeMode: Filesystem

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rgw-bck
  namespace: backup
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: "csi-cephfs-sc"
  volumeMode: Filesystem
  volumeName: rgw-bck

Conclusion

I now have an automated backup job that dumps all my buckets in the following fashion.

1
2
3
jy@opx02:/mnt/bd/backup/rgw$ ls
git-lfs           gitlab-backups          gitlab-dependency-proxy  gitlab-packages  gitlab-terraform-state  registry      tmp
gitlab-artifacts  gitlab-ci-secure-files  gitlab-mr-diffs          gitlab-pages     gitlab-uploads          runner-cache

Since I have an object storage in place and it will be backed up properly in the future, I can finally properly configure CNPG to back up using object storage, instead of a custom pg-dump job.

Built with Hugo
Theme Stack designed by Jimmy