Featured image of post Kubernetes log collection using Fluent Bit and syslog

Kubernetes log collection using Fluent Bit and syslog

I knew log management was going to be an issue from the very beginning of setting up my kubernetes cluster. 4 months later, I finally have a solution that works for me.

Overview

I used fluent-bit to continuously tail my container logs, and syslog-ng to collect and store them.

This setup has the following advantages compared to relying on kubectl logs:

  1. Logs are stored in a central location, not separated across nodes
  2. They are stored persistently
  3. Logs are decoupled from any Kubernetes resources. i.e. they are not lost when a pod is deleted, or even when the entire cluster is deleted

Prerequisites

  1. A Kubernetes cluster
  2. Some kind of storage class available in the cluster

Set up syslog-ng

This syslog-ng pod will serve as the destination for all logs collected by fluent-bit. It will later be referenced in the fluent-bit configuration.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
apiVersion: v1
kind: ConfigMap
metadata:
  name: syslog
  namespace: monitoring
data:
  syslog-ng.conf: |
    @version: 4.2
    @include "scl.conf"

    options {
        time-zone("America/Denver");
    };

    source s_fluentd_tcp {
        syslog(port(514) transport("tcp") flags(syslog-protocol));
    };

    source s_fluentd_udp {
        syslog(port(514) transport("udp") flags(syslog-protocol));
    };

    # Dynamic log path with Namespace, Pod, and Container
    destination d_namespace_logs {
        file("/var/log/k8s/${.SDATA.kubernetes.namespace_name}/${.SDATA.kubernetes.app}.log"
            create-dirs(yes)
            template("$ISODATE ${.SDATA.kubernetes.pod_name} ${.SDATA.kubernetes.container_name} $MSG\n"));
    };

    log {
        source(s_fluentd_tcp);
        source(s_fluentd_udp);
        destination(d_namespace_logs);
    };    

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: syslog
  namespace: monitoring
  labels:
    app: syslog
spec:
  replicas: 1
  selector:
    matchLabels:
      app: syslog
  template:
    metadata:
      labels:
        app: syslog
    spec:
      containers:
      - name: syslog
        image: lscr.io/linuxserver/syslog-ng:latest
        ports:
          - containerPort: 514
          - containerPort: 601
          - containerPort: 6514
        env:
          - name: PUID
            value: "1000"
          - name: PGID
            value: "1000"
          - name: TZ
            value: "America/Denver"
        volumeMounts:
        - mountPath: /var/log
          name: syslog
        - mountPath: /config/syslog-ng.conf
          name: syslog-config
          subPath: syslog-ng.conf
      - name: logrotate
        image: blacklabelops/logrotate
        env:
          - name: LOGROTATE_COPIES
            value: "10"
          - name: LOGS_DIRECTORIES
            value: /var/log/
          - name: LOGROTATE_INTERVAL
            value: daily
          - name: LOGROTATE_DATEFORMAT
            value: "-%Y%m%d"
        volumeMounts:
        - mountPath: /var/log
          name: syslog
      securityContext:
        fsGroup: 1000
      volumes:
      - name: syslog
        persistentVolumeClaim:
          claimName: syslog
      - name: syslog-config
        configMap:
          name: syslog

---

apiVersion: v1
kind: Service
metadata:
  name: syslog
  namespace: monitoring
spec:
  selector:
    app: syslog
  ports:
  - protocol: UDP
    port: 514
    name: syslog-udp
    targetPort: 514
  - protocol: TCP
    name: syslog-tcp
    port: 601
    targetPort: 601
  - protocol: TCP
    port: 6514
    targetPort: 6514
    name: syslog-tls

For the persistent volume, I used a storage class backed by CephFS, so that logs are stored redundantly across 3 storage nodes. The configuration is not included here, but any storage class should work.

syslog-ng configuration

In the syslog configuration, I used two variables natively available in syslog. They will be populated by fluent-bit when it sends logs to syslog-ng.

  1. ${.SDATA.kubernetes}: $SDATA is a special field that contains a data structure with multiple fields. The key kubernetes will be set up later in the fluent-bit configuration to store kubernetes-specific fields.
  2. $ISODATE: The current date and time in ISO format. There are a few other date formats available.
  3. $MSG: The log message

The above configuration should create a directory structure like this:

1
2
3
4
5
6
7
.
|-- namespace1
|   |-- app1.log
|   |-- app2.log
|-- namespace2
|   |-- app1.log
|   |-- noapp.log

Logs will be stored according to the namespace and app name (deployment name) of the pod that generated them. If the app name is not available, the logs will be stored in a file named noapp.log.

Log rotation

Along with the syslog-ng container, I also deployed a logrotate container. This container will rotate logs daily and keep the last 10 copies of each log file.

The result will look like this:

1
2
3
4
5
.
|-- namespace1
|   |-- app1.log
|   |-- app1.log-20250327
|   |-- app1.log-20250326

Set up fluent-bit

I used helm to install fluent-bit, with the following configuration:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
# values.yaml

serviceAccount:
  create: true
  annotations: {}
  name:

rbac:
  create: true
  nodeAccess: false
  eventsAccess: false

## https://docs.fluentbit.io/manual/administration/configuring-fluent-bit/classic-mode/configuration-file
config:
  service: |
    [SERVICE]
        Daemon Off
        Flush {{ .Values.flush }}
        Log_Level {{ .Values.logLevel }}
        Parsers_File /fluent-bit/etc/parsers.conf
        Parsers_File /fluent-bit/etc/conf/custom_parsers.conf
        HTTP_Server On
        HTTP_Listen 0.0.0.0
        HTTP_Port {{ .Values.metricsPort }}
        Health_Check On    

  ## https://docs.fluentbit.io/manual/pipeline/inputs
  inputs: |
    [INPUT]
        Name tail
        Path /var/log/containers/*.log
        multiline.parser docker, cri
        Tag kube.*
        Mem_Buf_Limit 5MB
        Skip_Long_Lines On

    [INPUT]
        Name systemd
        Tag host.*
        Systemd_Filter _SYSTEMD_UNIT=k3s.service
        Read_From_Tail On    

  ## https://docs.fluentbit.io/manual/pipeline/filters
  filters: |
    [FILTER]
        Name kubernetes
        Match kube.*
        Merge_Log On
        Keep_Log On
        K8S-Logging.Parser On
        K8S-Logging.Exclude on
        Buffer_Size 64KB

    [FILTER]
        Name nest
        Match kube.*
        Operation lift
        Nested_Under kubernetes
        Add_prefix k8s_

    [FILTER]
        Name nest
        Match kube.*
        Operation lift
        Nested_Under k8s_labels
        Add_prefix k8s_

    [FILTER]
        Name modify
        Match kube.*
        Add k8s_app noapp

    [FILTER]
        Name nest
        Match kube.*
        Wildcard k8s_*
        Operation nest
        Nest_under kubernetes
        Remove_prefix k8s_    

  ## https://docs.fluentbit.io/manual/pipeline/outputs
  outputs: |
    [OUTPUT]
        name                 syslog
        match                kube.*
        host                 syslog.monitoring.svc.cluster.local
        port                 514
        mode                 udp
        syslog_format        rfc5424
        syslog_maxsize       2048
        syslog_severity_key  severity
        syslog_facility_key  facility
        syslog_hostname_key  hostname
        syslog_appname_key   appname
        syslog_procid_key    app
        syslog_msgid_key     msgid
        syslog_sd_key        kubernetes
        syslog_message_key   log

    [OUTPUT]
        Name stdout
        Match *
        Format json    

Install fluent-bit with the values:

1
2
helm repo add fluent https://fluent.github.io/helm-charts
helm install fluent-bit fluent/fluent-bit -f values.yaml -n monitoring

Now both fluent-bit and syslog-ng should be running, producing logs in the destination configured earlier.

Filters

Here, I’m using the kubernetes filter to parse logs generated by Kubernetes pods. The filter will generate a field in the log message called kubernetes, which will contain all the kubernetes-specific fields.

However, the structure will be nested like this:

1
2
3
4
5
6
7
kubernetes:
  namespace_name: namespace1
  pod_name: pod1
  container_name: container1
  labels:
    app: app1
    ...

And since syslog-ng cannot handle nested fields, I flattened the fields like this:

1
2
3
4
5
6
kubernetes:
  namespace_name: namespace1
  pod_name: pod1
  container_name: container1
  app: app1
  ...

This will enable syslog-ng to access fields like $SDATA.kubernetes.namespace_name and $SDATA.kubernetes.app.

Conclusion

It has been a few days since I set this up, and so far it’s running great.

I’ve also looked at some other solutions such as Grafana Loki, but I found them to be overkill for my needs. What I like about fluent-bit and syslog-ng is that the logs are just collected and stored in plain text, with minimal processing.

Built with Hugo
Theme Stack designed by Jimmy