Home lab update - mid 2025

It’s been about half a year since the last update on my home lab setup. Initially I planned to do this annually, but a lot has changed since the last time, so I decided to do a quick update.

Overview

A big change from last year is that I moved to a new place and gained control over my own router (and hence the whole network!). It made the architecture much simpler by getting rid of some hacky stuff.

High-level architecture overview

External-facing applications

These are the applications that are accessible from the internet. Some of them are for my personal use.

Application	Description
Domain entrypoint	Landing page for my domain
Portfolio website	My personal portfolio website
This blog	My personal blog
Review Planner	A tool for planning code reviews (WIP on GitHub)
Commafeed	A self-hosted RSS feed reader
Linkwarden	A self-hosted website archiving service
Static file server	An nginx file server that I use for sharing my resume, etc.

CI/CD

ArgoCD continues to be my go-to tool for GitOps. It allows me to manage my Kubernetes resources declaratively in a git repository, and automatically deploys changes to the cluster.

In this past few months, I focused on setting up proper CI/CD pipelines for my applications. Now all my applications are built and deployed automatically when I push changes to the repository.

CI/CD workflow

For each application, when there is a new change in repository,

GitHub Actions pushes a new image to the container registry.
ArgoCD Image Updater picks it up and updates the manifest repository with the new image tag.
ArgoCD detects change in the manifest repository and deploys the new image to the cluster.
Whenever there is a change in deployment, ArgoCD sends a notification a Slack channel.

Observability

I finally have a centralized log aggregation mechanism configured. My initial approach was to use Promtail, Loki, and Grafana, but I found it to be an overkill for my needs. All I wanted was a highly available log storage that allowed me to look through the logs with vim, less, and similar tools.

Therefore I resorted to using fluentbit to collect logs from all pods, and send them to a syslog-ng pod that’s also running in the cluster. The syslog-ng pod is configured to store logs in a persistent volume backed by CephFS, which I can access easily from my home network.

Log streaming

The result looks something like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


|-- blog <- namespace
|   |-- blog.log <- current log file
|   |-- blog.log-20250601 <- archived log files
|   |-- blog.log-20250602
|   |-- blog.log-20250603
|   |-- remark42.log
|   |-- remark42.log-20250601
|   |-- remark42.log-20250602
|   `-- remark42.log-20250603
|-- blog-stg
|   |-- blog.log
|   |-- blog.log-20250601
|   |-- blog.log-20250602
|   |-- blog.log-20250603
|   |-- remark42.log
|   |-- remark42.log-20250601
|   |-- remark42.log-20250602
|   `-- remark42.log-20250603

I also deployed a homepage for myself to get to different applications / management consoles quickly.

homepage

Kubernetes log collection using Fluent Bit and syslog

Infrastructure

A lot has happened to the infrastructure in my home lab since the last update. I migrated from using kvm with libvirt to Proxmox, and set up a Ceph cluster for distributed storage. By doing that, the whole architecture became simpler to manage and more reliable.

Two additional Dell Optiplexes were added to the cluster, bringing the total number of physical hosts to five.

Infrastructure

In the post: Load balancer for Proxmox cluster, I set up a load balancer for the Proxmox management interface, so that I can just use the hostname pmx.i.junyi.me to access the management interface.

Network

Following are the primary upgrades to my network setup:

Ditched Tailscale and configured port-forwarding directly on my router.
Set up bind9 as a local DNS server for internal domain resolution, paired with traefik as reverse proxy. For example, blog.junyi.me is the publically accessible blog, but blog.i.junyi.me is the internal domain that points to the testing version of the blog.

Network flow

As written in this post: Setting up an internal network and DNS with Kubernetes, traefik, and bind9, traefik is responsible for exposing public and internal services on separate IPs (services).

Still using metallb for load balancing, and cert-manager for managing TLS certificates.

Conclusion

It’s been a fun half year. I learned a lot about CI/CD, logging, and infrastructure management, but I still have a lot to learn on each of those topics and more. There are some potential improvements that I can think of right now:

Backup and restore strategy for the kubernetes cluster
Monitoring and alerting system using something like Prometheus and Grafana
Infrastructure as code using Terraform
Centralized secrets management using something like HashiCorp Vault

As for the hardware side of things, I really want to upgrade my network to 10Gbps, and add some additional storage for backups.