SMIL manifest

Input specification using a SMIL file

Input SMIL syntax

SMIL is an XML format for multimedia presentation. It can be used instead of an HLS master playlist or DASH Media Presentation File to tell what media files should be combined into an asset and some associated metadata parameters. The level of detail, however, is much lower than in the other formats. ew-vodingest supports the import of MP4 files for video and audio, in connection to subtitle files.

Our usage of SMIL follows legacy format of Wowza, but we have added some parameters as displayName for HLS and role for subtitles.

Basic structure

SMIL files should specify all relevant media files in a switch block in the body:

<?xml version="1.0" encoding="UTF-8"?>
<smil>
    <body>
        <switch>
            <video src="movie1.mp4" ... />
            <video src="audio.mp4" ... />
            <textstream ... />
            <srt ... />
        </switch>
    </body>
</smil>

Stream types

The supported stream types are <video> which either means video, audio or both, <texstream> that is a subtitle file or <srt> which is a subtitle file in SRT format.

Video/Audio streams

Audio and video streams are identified by the video tag.

Minimal configuration

The simplest variant is to just give a src attribute:

<video src="videoAndAudio.mp4"/>

If this is the only information give, all video and audio tracks will be extracted and given names following the patterns

media type pattern example
video video_<codec>_<bitrate> video_hevc_9000kbps
audio audio_<codec>_<lang>_<bitrate> audio_aac_en_256kbps

depending on what tracks and codecs are available. The bitrate is calculated from the file size and duration and the language is extracted from the mp4 file if available in an elng, of in the mdhd box as a fallback.

If multiple files are specified in the SMIL file, tracks will be extracted from all of them, but if the resulting names (mediatype_codec_bitrate_language) coincide, only one copy will be kept. This makes it possible to import files which all include the same audio bitrate and language, but different video bitrates.

Specifying the bitrate

The system-bitrate attribute is used to specify the bitrate for the stream.

<video src="video.mp4" system-bitrate="2500000" />

Optionally it is possible to specify bitrates as <param> values as

<video src="videoAndAudio.mp4">
    <param name="videoBitrate" value="2500000"/>
    <param name="audioBitrate" value="128000"/>
</video>

Track selection and track-specific parameters

To achieve higher control over the extraction of tracks, and their parameters, it is possible to add extra parameters.

In particular, one can use the audioOnly and videoOnly keys to specify that only one type of media track should be extracted.

To extract audio and video in this way, one could use the following snippet:

<video src="videoAndAudio.mp4" system-bitrate="2500000">
    <param name="videoOnly" value="TRUE"/>
</video>
<video src="videoAndAudio.mp4" system-bitrate="128000">
    <param name="audioOnly" value="TRUE"/>
</video>

In addition, one can add an audioindex query index to extract a specific audio track. The audioindex value relates to the track ID inside an mp4 file, but is zero-based. The mapping is that ?audioindex=0 refers to the audio track with the lowest track ID, ?audioindex=1 to the second, and so on.

Here is an example that extracts the first two audio tracks, and gives them different parameters for language, bitrate, and displayName:

<video src="hev1_aac_mc.mp4?audioindex=0" system-language="dk" audio-bitrate="256000">
<param name="audioOnly" value="TRUE"/>
<param name="displayName" value="Danish 6ch"/>
</video>
<video src="hev1_aac_mc.mp4?audioindex=1" system-language="dk" audio-bitrate="192000">
<param name="audioOnly" value="TRUE"/>
<param name="displayName" value="Danish 2ch"/>
</video>

Audio Language

The language for an audio stream can be set using the system-language attribute:

<video src="mp4:video3.mp4?audioindex=0" system-language="eng">

For legacy reasons, one can alternatively use the attribute systemLanguage or language.

If the language is not specified, 3-letter language code in the mdhd box will be used. It will in turn be overridden by the optional elng box that can contain any language code.

Subtitle input

The supported input subtitle formats are TTML, WebVTT, STL, SRT. In all cases, a complete side-loaded file is expected. As part of the ESF format, the subtitles will be transformed into segmented wvtt. A complete WebVTT file will also be generated and referred to in the generated DASH manifest. The name of the output subtitle tracks are of the form

media type pattern example
subtitles subtitles_wvtt_<lang>_<role> subtitles_wvtt_se_caption

The role can be either caption or subtitle. If not specified, the role will be not be in the track name.

Subtitle streams in TTML, WebVTT, STL, or STT files are specified with the <textstream> or <srt> tags. The language can be specified with language attribute, or, for textstream, with the system-language attribute like:

<textstream src="subtitles.ttml" system-language="en" />
<srt src="swedish.srt" language="se"/>

The format is auto-detected from the file extension, which must be one of:

 .ttml, .webvtt, .vtt, .stl, .srt

The case of multiple languages in the same TTML file is not supported.

There is no bitrate specified for text streams. It will always be set to 1kbps.

Extracting language from subtitle file name

For the case where the SMIL-file is missing or there is no language attribute for the subtitle files, ew-vodingest will try to extract a language from the file name. The language extraction algorithm works like this:

  1. the file extension is removed
  2. split the name on “-” characters
  3. if the last part is at most three characters, use it as a language code

If a language is not found, the subtitle languages will be denoted “und”, “und1”, “und2” etc.

Example SMIL files

In the following, we give examples to show some possible variations of supported SMIL files.

Example 1 - video and audio from all files

This example has width and height for the video. That information will be discarded. There are only two distinct combinations of language and bitrate for audio, so only two variants audio_aac_eng_128kbps and audio_aac_eng_192kbps will be generated.

<?xml version="1.0" encoding="UTF-8"?>
<smil>
    <body>
    <switch>
        <video height="360" src="profile1.mp4" systemLanguage="eng" width="480">
        <param name="videoBitrate" value="500000"/>
        <param name="audioBitrate" value="128000"/>
        </video>
        <video height="480" src="profile2.mp4" systemLanguage="eng" width="720">
        <param name="videoBitrate" value="800000"/>
        <param name="audioBitrate" value="128000"/>
        </video>
        <video height="540" src="profile3.mp4" systemLanguage="eng" width="960">
        <param name="videoBitrate" value="1300000"/>
        <param name="audioBitrate" value="128000"/>
        </video>
        <video height="720" src="profile4.mp4" systemLanguage="eng" width="1280">
        <param name="videoBitrate" value="2300000"/>
        <param name="audioBitrate" value="192000"/>
        </video>
        <video height="1080" src="profile5.mp4" systemLanguage="eng" width="1920">
        <param name="videoBitrate" value="5000000"/>
        <param name="audioBitrate" value="192000"/>
        </video>
    </switch>
    </body>
</smil>

Example 2 - audioindex queries

This example shows extraction of audio tracks using the audioindex query parameter. The mp4: “scheme” is not needed, but supported for legacy reasons. The mp4:/// scheme is also supported for the same reason.

<?xml version="1.0"?>
<smil>
    <body>
    <switch>
        <video src="mp4:video1.mp4?audioindex=0" system-language="eng" audio-bitrate="96000">
        <param name="audioOnly" value="TRUE"/>
        </video>
        <video src="mp4:video1.mp4?audioindex=1" system-language="ger" audio-bitrate="96000">
        <param name="audioOnly" value="TRUE"/>
        </video>
        <video src="video2.mp4" system-bitrate="2000000">
        <param name="videoOnly" value="TRUE"/>
        </video>
        <video src="video1.mp4" system-bitrate="5000000">
        <param name="videoOnly" value="TRUE"/>
        </video>
        <textstream src="subtitles.ttml" system-language="eng">
        </textstream>
    </switch>
    </body>
</smil>

Example 3 - displayName and role parameters

This example uses parameters for displayName for audio and subtitles, and role for subtitles.

<?xml version="1.0" encoding="utf-8"?>
<smil>
  <body>
    <switch>
      <video src="video800.mp4" system-bitrate="800000">
        <param name="videoOnly" value="TRUE"/>
      </video>
      <video src="video400.mp4" system-bitrate="400000">
        <param name="videoOnly" value="TRUE"/>
      </video>
      <video src="audio.mp4?audioindex=0" system-language="eng" audio-bitrate="256000">
        <param name="audioOnly" value="TRUE"/>
        <param name="displayName" value="English 6ch"/>
      </video>
      <video src="audio.mp4?audioindex=1" system-language="eng" audio-bitrate="192000">
        <param name="audioOnly" value="TRUE"/>
        <param name="displayName" value="English 2ch"/>
      </video>
      <srt src="swe.srt" language="swe">
        <param name="displayName" value="svenska"/>
        <param name="role" value="subtitle"/>
      </srt>
      <srt src="swe_cc.srt" language="swe">
        <param name="displayName" value="svenska (CC)"/>
        <param name="role" value="caption"/>
      </srt>
      <textstream src="eng.stl" system-language="eng">
        <param name="displayName" value="English"/>
        <param name="role" value="caption"/>
      </textstream>
    </switch>
  </body>
</smil>

Example 4 - The simplest possible - same as no SMIL

This example shows a SMIL file where all parameters are extracted automatically. In this case all video sources contain audio with language set to engin the mdhd box in the mp4 files, and the subtitle languages can be extracted from the file names.

<?xml version="1.0" encoding="UTF-8"?>
<smil>
  <body>
    <switch>
      <video src="0.mp4"/>
      <video src="1.mp4"/>
      <video src="2.mp4"/>
      <video src="3.mp4"/>
      <srt src="xyz-eng.srt"/>
      <srt src="xyz-spa.srt"/>
    </switch>
  </body>
</smil>

The generated tracks are:

  • 4 video tracks with different bitrates
  • 2 audio track with different bitrates, but the same language “eng”
  • 2 subtitle tracks with language codes “eng” and “spa”

In this particular case, the SMIL file just provides a list of files and no extra parameter. Therefore, it is also possible to specify the directory of this file as input to ew-vodingest and get exactly the same result.