Recently, I’ve been trying to learn how to serve video files with adaptive bitrate streaming. I have successfully prepared a video for adaptive bitrate streaming using ffmpeg and MP4Box, so this post will be a record of how I did it.
Prerequisites
In this post, I will be using a sample mp4 video file downloaded from samplelib (the loudest one).
Get video information
First, check the video information using ffprobe
.
|
|
Example output:
|
|
Here, there are a few information that we need to take note of:
- Video resolution: 1920x1080
- Video bitrate: 5569 kb/s
- Video frame rate: 30 fps
- Audio bitrate: 126 kb/s
- Audio sample rate: 44100 Hz
Based on this, for video, we will keep the maximum resolution to 1080p, and make sure the bitrate is less than 5569 kb/s. For audio, we will keep the sample rate to 44100 Hz and the bitrate to 126 kb/s.
Transcode video
|
|
Options
-y
- overwrite output file if it exists-i sample.mp4
- input file-c:v libx264
- use x264 codec for video-r 25
- set frame rate to 25 fps-x264opts 'keyint=48:min-keyint=48:no-scenecut'
- set key frame interval to 2 secondskeyint=50
- key frame interval (every 50 frames = 2 seconds)min-keyint=50
- minimum key frame intervalno-scenecut
- disable scene cut detection to ensure key frames are placed at regular intervals
-vf scale=-2:1080
- scale video to 1080p, keeping aspect ratio-b:v 4300k
- set video bitrate to 4300 kbps-maxrate 4300k
- set maximum video bitrate to 4300 kbps-movflags faststart
- move moov atom to the beginning of the file for faster start (common in web-based streaming)-bufsize 8600k
- set buffer size to 8600 kbps-profile:v main
- uses the main profile of H.264, which is commonly supported-preset fast
- set encoding speed to fast (can be changed to slower for better quality)-an
- ignore audio
For the keyint
and min-keyint
, the value is set to 50 because the video frame rate is 25 fps, and we want a key frame every 2 seconds. Since we will set segment duration to 4 seconds later when packaging the video, having a key frame every 2 seconds ensures each segment contains two key frames.
A good practice is to set bufsize
(buffer size) to 2x the maxrate
(maximum bitrate).
ChatGPT gave a good analogy:
Imagine the encoder as trying to pour water (data) into a bucket (the buffer):
- b:v defines how fast the faucet should run on average.
- maxrate defines the peak flow rate allowed from the faucet.
- bufsize defines how big the bucket is.
For my video, I produced the following different profiles:
|
|
So the full commands will look like this:
|
|
Transcode audio
|
|
Options
-y
- overwrite output file if it exists-i sample.mp4
- input file-map 0:1
- select the first audio stream from the input file-vn
- ignore video-c:a aac
- use AAC codec for audio. AAC is a lossy audio codec that is widely used for streaming and is supported by most devices and browsers.-b:a 126k
- set audio bitrate to 126 kbps to match the original file-ar 44100
- set audio sample rate to 44100 Hz to match the original file-ac 2
- set number of audio channels to 2 (stereo)
Package video and audio
It’s time to put everything together.
|
|
Options
-dash 4000
- set dash segment duration to 4 seconds (each video file contains 4 seconds of video)-frag 4000
- each segment only contains one fragment-rap
- use random access points (key frames) for segmentingl
Test playout
There are multiple ways to test this. For example, with VLC,
|
|
Conclusion
I have learned that video transcoding and packaging for adaptive bitrate streaming involves a lot of configurations and considerations, and I’m far from comprehending all of them. However, trying it out myself has been a great learning experience.
Extending on this, I have done some projects such as:
- Automating transcoding and packaging
- Automating thumbnail generation
- Serving mpeg-dash video with nginx
- Playing mpeg-dash video on a website
Just to familiarize myself with the whole process. Maybe some of them will make it to this blog in the future.