Featured image of post Auto-renew Proxmox certificates with cert-manager and Ansible

Auto-renew Proxmox certificates with cert-manager and Ansible

In a previous post, I set up HAProxy as a reverse proxy for Proxmox VE, making the Proxmox web interface accessible via a custom domain. Even if some nodes are down, HAProxy will still route traffic to the available nodes.

However, the Proxmox web interfaces had been using self-signed certificates, which makes browsers freak out from time to time.

Potential security risk ahead

The functionalities were not affected, and it’s not really a risk since I’m only accessing it from my internal network, but I thought it was about time to make things right.

My environment

  1. Proxmox VE cluster, accessible on pmx.i.junyi.me
  2. Kubernetes: v1.33.4+k3s1
  3. cert-manager:v1.18.0
  4. haproxy:bookworm

Every Proxmox node was using the default self-signed certificate.

Prepare a certificate

With cert-manager installed in the cluster, issuing a certificate is as easy as creating a Certificate resource.

cert.yml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: junyi-me-pmx
  namespace: cert-manager
spec:
  secretName: junyi-me-pmx
  issuerRef:
    name: letsencrypt-production
    kind: ClusterIssuer
  dnsNames:
  - "*.junyi.me"
  - "pmx.i.junyi.me"

Distribute the certificate

The plan here is to use Ansible to distribute the certificate to all Proxmox nodes, and re-run the job periodically to ensure the certificate is always up-to-date.

Since the SSL certificate lives in a Kubernetes secret, the easiest way to make this happen is through a Kubernetes cronjob running Ansible. But before that, I had to ensure the Ansible job could access the Proxmox nodes via SSH without a password.

ssh-copy-id

So, I wrote a simple Ansible playbook to copy my public SSH key to all Proxmox nodes. This has to be ran only once, and whenever I add a new node to the cluster.

directory structure:

1
2
3
4
5
6
.
├── inventory.ini
├── ssh-copy-id.yml
├── ssh
│   ├── id_ed25519
│   └── id_ed25519.pub
inventory.ini
1
2
3
4
5
6
7
8
[all]
root@10.0.69.3
root@10.0.69.4
root@10.0.69.5
root@10.0.69.6
root@10.0.69.9
root@10.0.69.11
root@10.0.69.12
copy-ssh-certs.yml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
- name: ssh-copy-id to Proxmox nodes
  hosts: all
  gather_facts: yes

  vars:
    ssh_pub_key_path: "{{ playbook_dir }}/ssh/id_ed25519.pub"

  tasks:
    - name: Read local public key from control node
      set_fact:
        pub_key: "{{ lookup('file', ssh_pub_key_path) }}"

    - name: Ensure ~/.ssh exists
      ansible.builtin.file:
        path: /root/.ssh
        state: directory
        mode: '0700'
        owner: root
        group: root

    - name: Add public key to authorized_keys
      ansible.builtin.authorized_key:
        user: root
        key: "{{ pub_key }}"
        state: present
        manage_dir: no

This playbook will

  1. Read the public key from the control node (where Ansible is ran)
  2. Ensure the .ssh directory exists on each Proxmox node
  3. Add the public key to the authorized_keys file on each node

Run the playbook:

1
ansible-playbook -i inventory.ini ssh-copy-id.yml

Keep certs up-to-date

With the SSH access set up, it’s time to write the Ansible playbook to distribute the certificates.

ssh-secret.yml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
apiVersion: v1
kind: Secret
metadata:
  name: pmx-hosts-ssh
  namespace: cert-manager
stringData:
  id_ed25519: |
    -----BEGIN OPENSSH PRIVATE KEY-----
    <redacted>
    -----END OPENSSH PRIVATE KEY-----    
  id_ed25519.pub: |
    <redacted>    
certificate.yml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: junyi-me-pmx
  namespace: cert-manager
spec:
  secretName: junyi-me-pmx
  issuerRef:
    name: letsencrypt-production
    kind: ClusterIssuer
  dnsNames:
  - "*.junyi.me"
  - "pmx.i.junyi.me"

The first secret contains an SSH key pair, which is the same one copied to all Proxmox nodes in the previous step.
The second one is the certificate resource, which will generate a secret named junyi-me-pmx containing the TLS certificate and private key. This is what I want to distribute to all Proxmox nodes.

configmap.yml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
apiVersion: v1
kind: ConfigMap
metadata:
  name: ansible-pmx-certs
  namespace: cert-manager
data:
  propagate.yml: |
    - name: Copy SSL certs to Proxmox nodes
      hosts: all
      become: yes
      gather_facts: yes
      vars:
        ansible_ssh_common_args: "-o StrictHostKeyChecking=no"
      tasks:
      - name: Copy certificate
        copy:
          src: /certs/tls.crt
          dest: "/etc/pve/nodes/{{ ansible_hostname }}/pve-ssl.pem"

      - name: Copy private key
        copy:
          src: /certs/tls.key
          dest: "/etc/pve/nodes/{{ ansible_hostname }}/pve-ssl.key"

      - name: Restart Proxmox services to load new SSL certs
        systemd:
          name: "{{ item }}"
          state: restarted
        loop:
        - pveproxy    
  hosts.ini: |
    [all]
    10.0.69.3
    10.0.69.4
    10.0.69.5
    10.0.69.6
    10.0.69.9
    10.0.69.11
    10.0.69.12    

I decided to include both the playbook and the inventory file in a ConfigMap, just for easier configuration. If I decide to add another ansible playbook for the same set of hosts, it might be a good idea to separate them.

cronjob.yml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
apiVersion: batch/v1
kind: CronJob
metadata:
  name: pmx-hosts-cert
  namespace: cert-manager
spec:
  timeZone: 'America/Denver'
  schedule: "0 0 * * 6"
  jobTemplate:
    spec:
      backoffLimit: 1
      template:
        spec:
          containers:
          - name: propagate
            image: alpine/ansible
            command: ["ansible-playbook", "-i", "/etc/ansible-pmx/hosts.ini", "/etc/ansible-pmx/propagate.yml"]
            volumeMounts:
            - name: ansible-config
              mountPath: /etc/ansible-pmx
            - name: ssh-key
              mountPath: /root/.ssh
              readOnly: true
            - name: tls-certs
              mountPath: /certs
              readOnly: true
          restartPolicy: Never
          volumes:
          - name: ansible-config
            configMap:
              name: ansible-pmx-certs
          - name: ssh-key
            secret:
              secretName: pmx-hosts-ssh
              items:
              - key: id_ed25519
                path: id_ed25519
              defaultMode: 0o600
          - name: tls-certs
            secret:
              secretName: junyi-me-pmx

This cronjob will run every Saturday at midnight (0 0 * * 6), which is frequent enough since Let’s Encrypt certificates are valid for 90 days.

On each run, it will:

  1. Copy both the certificate and private key to each Proxmox node
  2. Restart the pveproxy service to load the new certificates
tip

Technically the keys only need to be copied to a single host, since Proxmox would sync contents of /etc/pve across the cluster using corosync, but copying to all nodes doesn’t hurt, and it makes the playbook simpler.

I just applied everything here, tested a few times, and waited for Saturday to come.

Conclusion

Since setting this up, a few Saturdays have passed, seems like it’s been running happily.

1
2
3
4
5
6
7
8
9
PLAY RECAP *********************************************************************
10.0.69.10                 : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
10.0.69.2                  : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
10.0.69.3                  : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
10.0.69.4                  : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
10.0.69.5                  : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
10.0.69.6                  : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
10.0.69.7                  : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
10.0.69.9                  : ok=4    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Hopefully I will never have to see the “Potential security risk ahead” warning again.

Built with Hugo
Theme Stack designed by Jimmy