Prepare videos for adaptive bitrate streaming with ffmpeg

Recently, I’ve been trying to learn how to serve video files with adaptive bitrate streaming. I have successfully prepared a video for adaptive bitrate streaming using ffmpeg and MP4Box, so this post will be a record of how I did it.

Prerequisites

  1. ffmpeg - to transcode video
  2. MP4Box - to create mpd manifest file
  3. A video file to transcode

In this post, I will be using a sample mp4 video file downloaded from samplelib (the loudest one).

Get video information

First, check the video information using ffprobe.

1
ffprobe sample.mp4

Example output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
...
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'sample.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.44.100
  Duration: 00:00:30.47, start: 0.000000, bitrate: 5687 kb/s
  Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 5569 kb/s, 30 fps, 30 tbr, 15360 tbn (default)
    Metadata:
      handler_name    : ISO Media file produced by Google Inc. Created on: 08/17/2020.
      vendor_id       : [0][0][0][0]
  Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 126 kb/s (default)
    Metadata:
      handler_name    : ISO Media file produced by Google Inc. Created on: 08/17/2020.
      vendor_id       : [0][0][0][0]

Here, there are a few information that we need to take note of:

  1. Video resolution: 1920x1080
  2. Video bitrate: 5569 kb/s
  3. Video frame rate: 30 fps
  4. Audio bitrate: 126 kb/s
  5. Audio sample rate: 44100 Hz

Based on this, for video, we will keep the maximum resolution to 1080p, and make sure the bitrate is less than 5569 kb/s. For audio, we will keep the sample rate to 44100 Hz and the bitrate to 126 kb/s.

Transcode video

1
2
3
4
5
ffmpeg -y -i sample.mp4 -c:v libx264 \
	-r 25 -x264opts 'keyint=50:min-keyint=50:no-scenecut' \
	-vf scale=-2:1080 -b:v 4300k -maxrate 4300k \
	-movflags faststart -bufsize 8600k \
	-profile:v main -preset fast -an "1080p.mp4"

Options

  1. -y - overwrite output file if it exists
  2. -i sample.mp4 - input file
  3. -c:v libx264 - use x264 codec for video
  4. -r 25 - set frame rate to 25 fps
  5. -x264opts 'keyint=48:min-keyint=48:no-scenecut' - set key frame interval to 2 seconds
    1. keyint=50 - key frame interval (every 50 frames = 2 seconds)
    2. min-keyint=50 - minimum key frame interval
    3. no-scenecut - disable scene cut detection to ensure key frames are placed at regular intervals
  6. -vf scale=-2:1080 - scale video to 1080p, keeping aspect ratio
  7. -b:v 4300k - set video bitrate to 4300 kbps
  8. -maxrate 4300k - set maximum video bitrate to 4300 kbps
  9. -movflags faststart - move moov atom to the beginning of the file for faster start (common in web-based streaming)
  10. -bufsize 8600k - set buffer size to 8600 kbps
  11. -profile:v main - uses the main profile of H.264, which is commonly supported
  12. -preset fast - set encoding speed to fast (can be changed to slower for better quality)
  13. -an - ignore audio

For the keyint and min-keyint, the value is set to 50 because the video frame rate is 25 fps, and we want a key frame every 2 seconds. Since we will set segment duration to 4 seconds later when packaging the video, having a key frame every 2 seconds ensures each segment contains two key frames.

A good practice is to set bufsize (buffer size) to 2x the maxrate (maximum bitrate).
ChatGPT gave a good analogy:

Imagine the encoder as trying to pour water (data) into a bucket (the buffer):

  1. b:v defines how fast the faucet should run on average.
  2. maxrate defines the peak flow rate allowed from the faucet.
  3. bufsize defines how big the bucket is.

For my video, I produced the following different profiles:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
tracks:
  - name: 1080p
    width: 1920
    height: 1080
    maxRate: 4300k
  - name: 720p
    width: 1280
    height: 720
    maxRate: 2000k
  - name: 480p
    width: 854
    height: 480
    maxRate: 1050k
  - name: 360p
    width: 640
    height: 360
    maxRate: 235k

So the full commands will look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
ffmpeg -y -i sample.mp4 -c:v libx264 \
	-r 25 -x264opts 'keyint=50:min-keyint=50:no-scenecut' \
	-vf scale=-2:1080 -b:v 4300k -maxrate 4300k \
	-movflags faststart -bufsize 8600k \
	-profile:v main -preset fast -an "1080p.mp4"

ffmpeg -y -i sample.mp4 -c:v libx264 \
    -r 25 -x264opts 'keyint=50:min-keyint=50:no-scenecut' \
    -vf scale=-2:720 -b:v 2000k -maxrate 2000k \
    -movflags faststart -bufsize 4000k \
    -profile:v main -preset fast -an "720p.mp4"

ffmpeg -y -i sample.mp4 -c:v libx264 \
    -r 25 -x264opts 'keyint=50:min-keyint=50:no-scenecut' \
    -vf scale=-2:480 -b:v 1050k -maxrate 1050k \
    -movflags faststart -bufsize 2100k \
    -profile:v main -preset fast -an "480p.mp4"

ffmpeg -y -i sample.mp4 -c:v libx264 \
    -r 25 -x264opts 'keyint=50:min-keyint=50:no-scenecut' \
    -vf scale=-2:360 -b:v 235k -maxrate 235k \
    -movflags faststart -bufsize 470k \
    -profile:v main -preset fast -an "360p.mp4"

Transcode audio

1
ffmpeg -y -i sample.mp4 -map 0:1 -vn -c:a aac -b:a 126k -ar 44100 -ac 2 audio1.m4a

Options

  1. -y - overwrite output file if it exists
  2. -i sample.mp4 - input file
  3. -map 0:1 - select the first audio stream from the input file
  4. -vn - ignore video
  5. -c:a aac - use AAC codec for audio. AAC is a lossy audio codec that is widely used for streaming and is supported by most devices and browsers.
  6. -b:a 126k - set audio bitrate to 126 kbps to match the original file
  7. -ar 44100 - set audio sample rate to 44100 Hz to match the original file
  8. -ac 2 - set number of audio channels to 2 (stereo)

Package video and audio

It’s time to put everything together.

1
2
3
4
5
6
7
8
MP4Box -dash 4000 -frag 4000 -rap \
	-segment-name 'segment_$RepresentationID$_' -fps 25 \
	360p.mp4#video:id=360 \
	480p.mp4#video:id=480p \
	720p.mp4#video:id=720p \
	1080p.mp4#video:id=1080p \
	audio1.m4a#audio:id=English:role=main \
	-out dash/playlist.mpd

Options

  1. -dash 4000 - set dash segment duration to 4 seconds (each video file contains 4 seconds of video)
  2. -frag 4000 - each segment only contains one fragment
  3. -rap - use random access points (key frames) for segmentingl

Test playout

There are multiple ways to test this. For example, with VLC,

1
vlc dash/playlist.mpd

Conclusion

I have learned that video transcoding and packaging for adaptive bitrate streaming involves a lot of configurations and considerations, and I’m far from comprehending all of them. However, trying it out myself has been a great learning experience.

Extending on this, I have done some projects such as:

  1. Automating transcoding and packaging
  2. Automating thumbnail generation
  3. Serving mpeg-dash video with nginx
  4. Playing mpeg-dash video on a website

Just to familiarize myself with the whole process. Maybe some of them will make it to this blog in the future.

Built with Hugo
Theme Stack designed by Jimmy