Prepare videos for adaptive bitrate streaming with ffmpeg

Recently, I’ve been trying to learn how to serve video files with adaptive bitrate streaming. I have successfully prepared a video for adaptive bitrate streaming using ffmpeg and MP4Box, so this post will be a record of how I did it.

Prerequisites

ffmpeg - to transcode video
MP4Box - to create mpd manifest file
A video file to transcode

In this post, I will be using a sample mp4 video file downloaded from samplelib (the loudest one).

Get video information

First, check the video information using ffprobe.

1

ffprobe sample.mp4

Example output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


...
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'sample.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.44.100
  Duration: 00:00:30.47, start: 0.000000, bitrate: 5687 kb/s
  Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 5569 kb/s, 30 fps, 30 tbr, 15360 tbn (default)
    Metadata:
      handler_name    : ISO Media file produced by Google Inc. Created on: 08/17/2020.
      vendor_id       : [0][0][0][0]
  Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 126 kb/s (default)
    Metadata:
      handler_name    : ISO Media file produced by Google Inc. Created on: 08/17/2020.
      vendor_id       : [0][0][0][0]

Here, there are a few information that we need to take note of:

Video resolution: 1920x1080
Video bitrate: 5569 kb/s
Video frame rate: 30 fps
Audio bitrate: 126 kb/s
Audio sample rate: 44100 Hz

Based on this, for video, we will keep the maximum resolution to 1080p, and make sure the bitrate is less than 5569 kb/s. For audio, we will keep the sample rate to 44100 Hz and the bitrate to 126 kb/s.

Transcode video

1
2
3
4
5


ffmpeg -y -i sample.mp4 -c:v libx264 \
	-r 25 -x264opts 'keyint=50:min-keyint=50:no-scenecut' \
	-vf scale=-2:1080 -b:v 4300k -maxrate 4300k \
	-movflags faststart -bufsize 8600k \
	-profile:v main -preset fast -an "1080p.mp4"

Options

-y - overwrite output file if it exists
-i sample.mp4 - input file
-c:v libx264 - use x264 codec for video
-r 25 - set frame rate to 25 fps
-x264opts 'keyint=48:min-keyint=48:no-scenecut' - set key frame interval to 2 seconds
1. keyint=50 - key frame interval (every 50 frames = 2 seconds)
2. min-keyint=50 - minimum key frame interval
3. no-scenecut - disable scene cut detection to ensure key frames are placed at regular intervals
-vf scale=-2:1080 - scale video to 1080p, keeping aspect ratio
-b:v 4300k - set video bitrate to 4300 kbps
-maxrate 4300k - set maximum video bitrate to 4300 kbps
-movflags faststart - move moov atom to the beginning of the file for faster start (common in web-based streaming)
-bufsize 8600k - set buffer size to 8600 kbps
-profile:v main - uses the main profile of H.264, which is commonly supported
-preset fast - set encoding speed to fast (can be changed to slower for better quality)
-an - ignore audio

For the keyint and min-keyint, the value is set to 50 because the video frame rate is 25 fps, and we want a key frame every 2 seconds. Since we will set segment duration to 4 seconds later when packaging the video, having a key frame every 2 seconds ensures each segment contains two key frames.

A good practice is to set bufsize (buffer size) to 2x the maxrate (maximum bitrate).
ChatGPT gave a good analogy:

Imagine the encoder as trying to pour water (data) into a bucket (the buffer):

b:v defines how fast the faucet should run on average.

maxrate defines the peak flow rate allowed from the faucet.

bufsize defines how big the bucket is.

For my video, I produced the following different profiles:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


tracks:
  - name: 1080p
    width: 1920
    height: 1080
    maxRate: 4300k
  - name: 720p
    width: 1280
    height: 720
    maxRate: 2000k
  - name: 480p
    width: 854
    height: 480
    maxRate: 1050k
  - name: 360p
    width: 640
    height: 360
    maxRate: 235k

So the full commands will look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23


ffmpeg -y -i sample.mp4 -c:v libx264 \
	-r 25 -x264opts 'keyint=50:min-keyint=50:no-scenecut' \
	-vf scale=-2:1080 -b:v 4300k -maxrate 4300k \
	-movflags faststart -bufsize 8600k \
	-profile:v main -preset fast -an "1080p.mp4"

ffmpeg -y -i sample.mp4 -c:v libx264 \
    -r 25 -x264opts 'keyint=50:min-keyint=50:no-scenecut' \
    -vf scale=-2:720 -b:v 2000k -maxrate 2000k \
    -movflags faststart -bufsize 4000k \
    -profile:v main -preset fast -an "720p.mp4"

ffmpeg -y -i sample.mp4 -c:v libx264 \
    -r 25 -x264opts 'keyint=50:min-keyint=50:no-scenecut' \
    -vf scale=-2:480 -b:v 1050k -maxrate 1050k \
    -movflags faststart -bufsize 2100k \
    -profile:v main -preset fast -an "480p.mp4"

ffmpeg -y -i sample.mp4 -c:v libx264 \
    -r 25 -x264opts 'keyint=50:min-keyint=50:no-scenecut' \
    -vf scale=-2:360 -b:v 235k -maxrate 235k \
    -movflags faststart -bufsize 470k \
    -profile:v main -preset fast -an "360p.mp4"

Transcode audio

1

ffmpeg -y -i sample.mp4 -map 0:1 -vn -c:a aac -b:a 126k -ar 44100 -ac 2 audio1.m4a

Options

-y - overwrite output file if it exists
-i sample.mp4 - input file
-map 0:1 - select the first audio stream from the input file
-vn - ignore video
-c:a aac - use AAC codec for audio. AAC is a lossy audio codec that is widely used for streaming and is supported by most devices and browsers.
-b:a 126k - set audio bitrate to 126 kbps to match the original file
-ar 44100 - set audio sample rate to 44100 Hz to match the original file
-ac 2 - set number of audio channels to 2 (stereo)

Package video and audio

It’s time to put everything together.

1
2
3
4
5
6
7
8


MP4Box -dash 4000 -frag 4000 -rap \
	-segment-name 'segment_$RepresentationID$_' -fps 25 \
	360p.mp4#video:id=360 \
	480p.mp4#video:id=480p \
	720p.mp4#video:id=720p \
	1080p.mp4#video:id=1080p \
	audio1.m4a#audio:id=English:role=main \
	-out dash/playlist.mpd

Options

-dash 4000 - set dash segment duration to 4 seconds (each video file contains 4 seconds of video)
-frag 4000 - each segment only contains one fragment
-rap - use random access points (key frames) for segmentingl

Test playout

There are multiple ways to test this. For example, with VLC,

1

vlc dash/playlist.mpd

Conclusion

I have learned that video transcoding and packaging for adaptive bitrate streaming involves a lot of configurations and considerations, and I’m far from comprehending all of them. However, trying it out myself has been a great learning experience.

Extending on this, I have done some projects such as:

Automating transcoding and packaging
Automating thumbnail generation
Serving mpeg-dash video with nginx
Playing mpeg-dash video on a website

Just to familiarize myself with the whole process. Maybe some of them will make it to this blog in the future.