Transcoding assets for Media Source Extensions

When working with Media Source Extensions, it is likely that you need to condition your assets before you can stream them. This article takes you through the requirements and shows you a toolchain you can use to encode your assets appropriately.

Getting started

  1. The first and most important step is to ensure that your files are comprised of a container and codec that users' browsers support.
  2. Depending on the codec, you might need to fragment the file to comply with the ISO BMFF spec.
  3. (Optional) If you decide to use Dynamic Adaptive Streaming over HTTP (DASH) for adaptive bitrate streaming, you need to transcode your assets into multiple resolutions. Most DASH clients expect a corresponding Media Presentation Description (MPD) manifest file, which is typically generated while generating the multiple resolution asset files.

Below we'll cover all of these steps, but first let's look at a toolchain we can use to do this fairly easily.

Sample Media

If you're looking to follow the steps listed here, but don't have any media to experiment with, you can grab the trailer to Big Buck Bunny. Big Buck Bunny copyrighted by the Blender Foundation and is licensed under the Creative Commons Attribution 3.0 license. Throughout this tutorial, you'll see the filename trailer_1080p.mov, which is the download.

Tools required

When working with MSE, the following tools are a must have:

  1. ffmpeg — A command-line utility for transcoding your media into the required formats. You can download a version for your system at the Download FFmpeg page. Extract the executable from the archive file and add it's location to your PATH statement. OSX users can also use homebrew to install ffmpeg.
  2. Bento4 — A set of command-line utilities for getting asset metadata and creating content for DASH. To install, you'll need to build/compile the application yourself from the provided project files/source files, depending on your OS and preferences. See the Building instructions for more details. The prebuilt file is here. Put the contents of the bin directory in the same place as ffmpeg.
  3. python2 — Bento4 uses it.

Get these installed successfully before moving to the next step.

Sample media should be placed in the Bento4 utils directory and worked here.

Note: The prebuilt ffmpeg does not include libfdk_aac due to licensing reasons. Bento4 uses this by default, so you need to compile ffmpeg if necessary. If you don't need it, add --audio-codec=aac to the mp4-dash-encode.py command line.

Container and Codec Support

As specified in section 1.1 of the MSE spec: Goals, MSE is designed not to require support for any particular media format or codec. While this is true on paper, browser support varies for specific container/codec combinations.

To check if the browser supports a particular container, you can pass a string of the MIME type to the MediaSource.isTypeSupported method:

js

MediaSource.isTypeSupported("audio/mp3"); // false
MediaSource.isTypeSupported("video/mp4"); // true
MediaSource.isTypeSupported('video/mp4; codecs="avc1.4D4028, mp4a.40.2"'); // true

The string is the MIME type of the container, optionally followed by a list of codecs. While the MIME type is fairly simple to figure out, we can get the codec string using the mp4info utility.

Currently, MP4 containers with H.264 video and AAC audio codecs have support across all modern browsers, while others don't.

To convert our sample media from a QuickTime MOV container to an MP4 container, we can use ffmpeg. Because the audio codec in the MOV container is already AAC and the video codec is h.264, we can instruct ffmpeg not to perform transcoding. Instead, it will just copy the audio and video tracks over without performing any transcoding, which is relatively faster than having to transcode.

$ ffmpeg -i trailer_1080p.mov -c:v copy -c:a copy bunny.mp4
$ ls
bunny.mp4         trailer_1080p.mov

Checking Fragmentation

In order to properly stream MP4, we need the asset to be an ISO BMF format MP4. Without proper fragmentation, any given MP4 file is not guaranteed to work with MSE. This means that metadata within the container is spread out and not lumped together.

To check whether an MP4 file is a proper MP4 stream, you can again use the mp4info utility to list the atoms of an MP4.

Note: The fragmented version is slightly larger than the original, due to additional metadata spread throughout the file. This is usually a file size increase of 1 percent or less.

Fragmenting

If you have an asset that is not already an MP4, ffmpeg can handle emitting a properly fragmented MP4 during the transcode process, with the -movflags frag_keyframe+empty_moov command line flag:

bash

ffmpeg -i trailer_1080p.mov -c:v copy -c:a copy -movflags frag_keyframe+empty_moov bunny_fragmented.mp4

If you already have an MP4, but it's not properly fragmented, you can again use ffmpeg:

bash

ffmpeg -i non_fragmented.mp4 -movflags frag_keyframe+empty_moov fragmented.mp4

In both cases, Chrome may require an extra movie flag to be set:

bash

-movflags frag_keyframe+empty_moov+default_base_moof

Having a properly fragmented MP4 file is all you need to get started. If you wish to employ adaptive bitrate streaming, you'll have to create encodings at multiple resolutions. While MSE is flexible enough to allow you to make your implementation, it's highly recommended to use an existing DASH client as DASH is a well-specified application protocol.

Creating Content for DASH

Given that you have ffmpeg and Bento4's utilities accessible through your $PATH, you can run Bento4's mp4-dash-encode.py Python script to generate multiple encodings of your content at various resolutions. Bento4's mp4-dash.py Python script can then be used to generate the corresponding MPD file needed by clients.

Run the following commands (shown with sample output):

bash

$ python mp4-dash-encode.py -b 5 -v bunny_fragmented.mp4
Encoding 5 bitrates, min bitrate = 500.0 max bitrate = 2000.0
Media Source: Video: resolution=640x360
ENCODING bitrate: 500, resolution: 256x144
ENCODING bitrate: 875, resolution: 384x216
ENCODING bitrate: 1250, resolution: 480x270
ENCODING bitrate: 1625, resolution: 560x316
ENCODING bitrate: 2000, resolution: 640x360

$ python mp4-dash.py video_0*
Parsing media file 1: video_00500.mp4
Parsing media file 2: video_00875.mp4
Parsing media file 3: video_01250.mp4
Parsing media file 4: video_01625.mp4
Parsing media file 5: video_02000.mp4
Splitting media file (audio) video_00500.mp4
Splitting media file (video) video_00500.mp4
Splitting media file (video) video_00875.mp4
Splitting media file (video) video_01250.mp4
Splitting media file (video) video_01625.mp4
Splitting media file (video) video_02000.mp4

$ tree -L 2 output
output
├── audio
│   └── und
├── stream.mpd
└── video
    ├── 1
    ├── 2
    ├── 3
    ├── 4
    └── 5

8 directories, 1 file

Note: mp4-dash-encode.py does not display ffmpeg error messages. You can see it by specifying the -d option.

Note: If "Invalid duration specification for force_key_frames: 'expr:eq(mod(n" is displayed as an error message, modify mp4-dash-encode.py and remove two "'" from "-force_key_frames 'expr:eq(mod(n,%d),0)'".

Summary

With your video properly encoded and adaptive bitrate media generated, you're now ready to begin adaptive bitrate streaming on the web using DASH and MSE.