Introduction
ESF is Edgeware’s CMAF-based storage format for both live and VoD content. It saves media data in CMAF tracks, and has additional metadata files. It supports H.264/AVC and HEVC/H.265 video, AAC and AC-3 audio, and wvtt subtitles.
Asset media data
The media data is stored as CMAF tracks in files with extensions “.cmfv”, “.cmfa”, and “.cmft”, for video, audio, and subtitles, respectively. The subtitle format is “wvtt” which is much more storage-efficient than “stpp”.
Asset metadata
The metadata for the assets are stored in two places:
- a single
content_info.json
file describing the tracks - segment info (
.dat
) files describing the segments, one per track
Content info
Content info is stored in a file named content_info.json
inside the asset directory.
It is a JSON file containing enough information about the tracks of the assets, to be
able to fill in all information about media tracks in HLS, DASH, or MSS manifests.
This includes codecs, languages, time scales, bitrates etc. However, it does
not contain information about individual segments. Such information is stored in segment info files.
Segment info
Metadata about the segments are stored in segment info files with extension .dat
.
There is exactly one such file for each media file.
These files describe the data of the individual segments of the CMAF tracks with a 32-byte entry for each segment:
Field | Type | Description |
---|---|---|
Nr | uint32 | Segment number |
Time | uint64 | Time (normally presentationTime=DecodeTime) |
Dur | uint32 | Duration of segment in timescale specified in content info |
Size | uint32 | Size in bytes |
Offset | uint64 | Byte offset inside track file |
Rest | uint32 | Flags for SCTE-markers and other information |
For VoD assets, an init segment is stored at the start of the file.
Its size is given by the Offset
of the first media segment in the
segment info file.
Commonalities with DASH OnDemand format
DASH OnDemand stores media data in the same type of CMAF tracks as ESF. However, the metadata is stored in other structures.
To describe the asset and its variants, there is a manifest called
Media Presentation Description
(MPD) which is an XML file with file extension .mpd
.
It is similar to the ESF content_info.json file. It has explicit switching groups called Adaptation Sets, but is lacking some other information compared to ESF like video parameter sets which are needed to generate MSS manifests.
Similar to ESF, there is a second structure that contains the information about the
segments. In the case of DASH OnDemand, this information is stored in a sidx
box
inside the CMAF track itself. Its position is at the beginning of the media file
right after the init segment, and before the actual media data.
By generating a DASH MPD file, and inserting a sidx
box in the media tracks,
it is possible to make VoD assets which are both compatible with DASH OnDemand and
the ESF format. The new ESB3031 ew-vodingest
tool generates such combined ingested
files. For DASH OnDemand, a complete WebVTT file is used instead of wvtt segments.
That complete WebVTT file is generated by extracting and concatenating all subtitle
cues from the wvtt
subtitle tracks used in the ESF format.
Live storage using ESF
The main reason to use ESF instead of DASH OnDemand, is that the latter does not support
live content, but requires a static structure. In fact, since the sidx
box must be placed
before the media, it is not even possible to concatenate media segments and write a
sidx
box at the end. With ESF we can have the same format for both VoD and live.
The SegmentInfo files are separated from the media files and use 32 bytes per segment. These are therefore easy to seek and also to grow, by just adding another 32 bytes for each segment. In the ESB3003 catchup buffer, we store live content in one-minute files.