NVIDIA-accelerated transcoding on Kubernetes

The purpose of the previous post: Feeding an NVIDIA GPU to k3s on Proxmox was to make this post possible. This post is about how I managed to make hardware-accelerated transcoding work in a k3s cluster using an NVIDIA GPU.

Environment

GPU: NVIDIA RTX 5060
Kubernetes: v1.32.6+k3s1
NVIDIA driver version: 570.158.01
NVIDIA GPU Operator version: v25.3.2

Initial attempt

With NVIDIA drivers and GPU Operator in place, I thought I could just run ffmpeg in a container to utilize hardware-accelerated transcoding. I was wrong.

This is the container I used to test it out:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


apiVersion: apps/v1
kind: Deployment
metadata:
  name: nvidia-test
  namespace: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nvidia-test
  template:
    metadata:
      labels:
        app: nvidia-test
    spec:
      runtimeClassName: nvidia
      containers:
        - name: nvidia-test
          image: nvidia/cuda:12.0.0-base-ubuntu22.04
          imagePullPolicy: IfNotPresent
          command:
            - sleep
            - "3600"
          resources:
            limits:
              nvidia.com/gpu: "1"
      restartPolicy: Always

Once the pod was running,

1
2
3
4
5


kubectl cp input.mp4 nvidia-test-774694874d-p4226:/root/input.mp4
kubectl exec -it nvidia-test-774694874d-p4226 -- bash
apt update && apt install -y ffmpeg
cd
ffmpeg -y -hwaccel cuda -i input.mp4 -vf format=yuv420p -c:v h264_nvenc output.mp4

And that got ffmpeg yelling at me:

1
2
3
4
5
6
7


[h264 @ 0x5b2e39aad740] Cannot load libnvcuvid.so.1
[h264 @ 0x5b2e39aad740] Failed loading nvcuvid.
[h264 @ 0x5b2e39aad740] Failed setup for format cuda: hwaccel initialisation returned error.
[h264_nvenc @ 0x5b2e386c3080] Cannot load libnvidia-encode.so.1
[h264_nvenc @ 0x5b2e386c3080] The minimum required Nvidia driver for nvenc is (unknown) or newer
Error initializing output stream 0:0 -- Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height
Conversion failed!

Seems like the container was missing two libraries: libnvcuvid.so.1 and libnvidia-encode.so.1, which sound like they are related to video encoding.

With some prompt engineering, I figured out what I needed. The libnvidia-encode-570-server needed to be installed for hardware-accelerated encoding to work, and since I installed all the drivers on the worker node VM, that’s where I need to install it.

Making the node happy

Installing the library was easy enough:

1

apt install -y libnvidia-encode-570-server

But when I restarted the pod and tried to run ffmpeg again, it yelled at me like nothing has changed:

1
2


[h264 @ 0x5b2e39aad740] Cannot load libnvcuvid.so.1
...

Since restarting the GPU Operator didn’t help either, I decided to run some tests on the worker node.
And it worked, rapidly.

1
2
3


$ ffmpeg -y -hwaccel cuda -i input.mp4 -vf format=yuv420p -c:v h264_nvenc output.mp4
...
frame= 1259 fps=1249 q=19.0 Lsize=    9985kB time=00:00:41.83 bitrate=1955.3kbits/s speed=41.5x

So, it works on the node, but not on the pod. Time for some debugging.

Making the pod happy

Now with only the container yelling at me, I started to compare the happy and unhappy environments.

Node:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


jy@gwork01:~$ ldconfig -p | grep nvidia
        libnvidia-ptxjitcompiler.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1
        libnvidia-ptxjitcompiler.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so
        libnvidia-pkcs11.so.570.158.01 (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvidia-pkcs11.so.570.158.01
        libnvidia-pkcs11-openssl3.so.570.158.01 (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvidia-pkcs11-openssl3.so.570.158.01
        libnvidia-opticalflow.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvidia-opticalflow.so.1
        libnvidia-opticalflow.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvidia-opticalflow.so
        libnvidia-opencl.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvidia-opencl.so.1
        libnvidia-nvvm.so.4 (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvidia-nvvm.so.4
        libnvidia-nvvm.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvidia-nvvm.so
        libnvidia-ml.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvidia-ml.so.1
        libnvidia-ml.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvidia-ml.so
        libnvidia-encode.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvidia-encode.so.1
        libnvidia-encode.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvidia-encode.so
        libnvidia-cfg.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvidia-cfg.so.1
        libnvidia-cfg.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvidia-cfg.so

Container:

1
2
3
4
5
6
7
8


root@nvidia-test-8575cb6df5-zvvrb:/# ldconfig -p | grep nvidia
        libnvidia-ptxjitcompiler.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1
        libnvidia-pkcs11.so.570.158.01 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvidia-pkcs11.so.570.158.01
        libnvidia-pkcs11-openssl3.so.570.158.01 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvidia-pkcs11-openssl3.so.570.158.01
        libnvidia-opencl.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1
        libnvidia-nvvm.so.4 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.4
        libnvidia-ml.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
        libnvidia-cfg.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.1

Clearly, the libraries related to encoding were missing in the container, but I didn’t know why.

So I started some vigorous prompt engineering, google searches, and documentation reading, looking for the things that would make it happy. But like that cat at my friends’s house, no matter what I tried, it just wouldn’t give a damn.

That was until I stumbled across this magical catnip: running ffmpeg with nvenc inside nvidia docker.

Apparently, there is an environment variable on the container called NVIDIA_DRIVER_CAPABILITIES that controls which libraries are made available inside the container, and it is set on the container running ffmpeg. According to the official documentation of NVIDIA Container Toolkit: Specialized Configurations with Docker, the default value is compute,utility, and video needs to be added to it for the video encoding libraries to be available inside the container.

I double-checked that the environment variable NVIDIA_DRIVER_CAPABILITIES was set in the container to the default value:

1
2


$ echo $NVIDIA_DRIVER_CAPABILITIES
compute,utility

and then updated my deployment:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


apiVersion: apps/v1
kind: Deployment
metadata:
  name: nvidia-test
  namespace: test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nvidia-test
  template:
    metadata:
      labels:
        app: nvidia-test
    spec:
      runtimeClassName: nvidia
      containers:
        - name: nvidia-test
          image: nvidia/cuda:12.0.0-base-ubuntu22.04
          imagePullPolicy: IfNotPresent
          command:
            - sleep
            - "3600"
          env:
            - name: NVIDIA_DRIVER_CAPABILITIES
              value: "compute,utility,video"
          resources:
            limits:
              nvidia.com/gpu: "1"
      restartPolicy: Always

After applying the changes, confirming the existence of encoding libraries in the container:

1
2
3
4
5


$ kubectl exec -it nvidia-test-774694874d-p4226 -- bash
# ldconfig -p | grep nvidia
    ...
    libnvidia-encode.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1
    ...

this was the moment of truth:

1

ffmpeg -y -hwaccel cuda -i input.mp4 -vf format=yuv420p -c:v h264_nvenc output.mp4

And it worked!
ffmpeg happily told me how it went:

1
2
3


frame= 1415 fps=1247 q=34.0 Lsize=    5969kB time=00:00:23.59 bitrate=2072.3kbits/s speed=20.8x
video:5929kB audio:7kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.550812%
[aac @ 0x5ba88a060a00] Qavg: 65536.000

And I could even see the GPU activity from the worker node:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.158.01             Driver Version: 570.158.01     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5060        Off |   00000000:01:00.0 Off |                  N/A |
|  0%   37C    P1             34W /  145W |     168MiB /   8151MiB |      8%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A           68572      C   ffmpeg                                  159MiB |
+-----------------------------------------------------------------------------------------+

I’m not going to post all the details here, but I tried some container images other than nvidia ones, and they all worked when:

The NVIDIA_DRIVER_CAPABILITIES environment variable was set to compute,utility,video
The container was run with the nvidia runtime class

Conclusion

The fact that NVIDIA_DRIVER_CAPABILITIES environment variable is set on the container running ffmpeg was a bit counter-intuitive to me and it wasn’t well-documented, at least on the NVIDIA GPU Operator documentation. Maybe it’s a niche use case, maybe I overlooked some things, but it was a pain to figure out.

Once I learned how these things play together, though, it is pretty straightforward now to run hardware-accelerated transcoding in k3s with an NVIDIA GPU.