Parsers

A parser extract structured information as a tree from a container as a file-like object. It does the type conversion when explicit but does not interpret anything else. Parsers can raise a ParserError.

EBML

EBML (Extensible Binary Meta Language) is used by Matroska and WebM.

Element types

enzyme.parsers.ebml.INTEGER

Signed integer element type

enzyme.parsers.ebml.UINTEGER

Unsigned integer element type

enzyme.parsers.ebml.FLOAT

Float element type

enzyme.parsers.ebml.STRING

ASCII-encoded string element type

enzyme.parsers.ebml.UNICODE

UTF-8-encoded string element type

enzyme.parsers.ebml.DATE

Date element type

enzyme.parsers.ebml.BINARY

Binary element type

enzyme.parsers.ebml.MASTER

Container element type

Main interface

enzyme.parsers.ebml.SPEC_TYPES

Specification types to Element types mapping

enzyme.parsers.ebml.READERS

Element types to reader functions mapping. See Readers

You can override a reader to use one of your choice here:

>>> def my_binary_reader(stream, size):
...     data = stream.read(size)
...     return data
>>> READERS[BINARY] = my_binary_reader
class enzyme.parsers.ebml.Element(id=None, type=None, name=None, level=None, position=None, size=None, data=None)

Base object of EBML

Parameters:
  • id (int) – id of the element, best represented as hexadecimal (0x18538067 for Matroska Segment element)
  • type (INTEGER, UINTEGER, FLOAT, STRING, UNICODE, DATE, MASTER or BINARY) – type of the element
  • name (string) – name of the element
  • level (int) – level of the element
  • position (int) – position of element’s data
  • size (int) – size of element’s data
  • data – data as read by the corresponding READERS
class enzyme.parsers.ebml.MasterElement(id=None, name=None, level=None, position=None, size=None, data=None)

Element of type MASTER that has a list of Element as its data

Parameters:
  • id (int) – id of the element, best represented as hexadecimal (0x18538067 for Matroska Segment element)
  • name (string) – name of the element
  • level (int) – level of the element
  • position (int) – position of element’s data
  • size (int) – size of element’s data
  • data (list of Element) – child elements

MasterElement implements some magic methods to ease manipulation. Thus, a MasterElement supports the in keyword to test for the presence of a child element by its name and gives access to it with a container getter:

>>> ebml_element = parse(open('test1.mkv', 'rb'), get_matroska_specs())[0]
>>> 'EBMLVersion' in ebml_element
False
>>> 'DocType' in ebml_element
True
>>> ebml_element['DocType']
Element(DocType, u'matroska')
load(stream, specs, ignore_element_types=None, ignore_element_names=None, max_level=None)

Load children Elements with level lower or equal to the max_level from the stream according to the specs

Parameters:
  • stream – file-like object from which to read
  • specs (dict) – see Specifications
  • max_level (int) – maximum level for children elements
  • ignore_element_types (list) – list of element types to ignore
  • ignore_element_names (list) – list of element names to ignore
  • max_level – maximum level of elements
get(name, default=None)

Convenience method for master_element[name].data if name in master_element else default

Parameters:
  • name (string) – the name of the child to get
  • default – default value if name is not in the MasterElement
Returns:

the data of the child Element or default

enzyme.parsers.ebml.parse(stream, specs, size=None, ignore_element_types=None, ignore_element_names=None, max_level=None)

Parse a stream for size bytes according to the specs

Parameters:
  • stream – file-like object from which to read
  • size (int or None) – maximum number of bytes to read, None to read all the stream
  • specs (dict) – see Specifications
  • ignore_element_types (list) – list of element types to ignore
  • ignore_element_names (list) – list of element names to ignore
  • max_level (int) – maximum level of elements
Returns:

parsed data as a tree of Element

Return type:

list

Note

If size is reached in a middle of an element, reading will continue until the element is fully parsed.

enzyme.parsers.ebml.parse_element(stream, specs, load_children=False, ignore_element_types=None, ignore_element_names=None, max_level=None)

Extract a single Element from the stream according to the specs

Parameters:
  • stream – file-like object from which to read
  • specs (dict) – see Specifications
  • load_children (bool) – load children elements if the parsed element is a MasterElement
  • ignore_element_types (list) – list of element types to ignore
  • ignore_element_names (list) – list of element names to ignore
  • max_level (int) – maximum level for children elements
Returns:

the parsed element

Return type:

Element

enzyme.parsers.ebml.get_matroska_specs(webm_only=False)

Get the Matroska specs

Parameters:webm_only (bool) – load only WebM specs
Returns:the specs in the appropriate format. See Specifications
Return type:dict

Readers

enzyme.parsers.ebml.readers.read_element_id(stream)

Read the Element ID

Parameters:stream – file-like object from which to read
Raises:ReadError – when not all the required bytes could be read
Returns:the id of the element
Return type:int
enzyme.parsers.ebml.readers.read_element_size(stream)

Read the Element Size

Parameters:stream – file-like object from which to read
Raises:ReadError – when not all the required bytes could be read
Returns:the size of element’s data
Return type:int
enzyme.parsers.ebml.readers.read_element_integer(stream, size)

Read the Element Data of type INTEGER

Parameters:
  • stream – file-like object from which to read
  • size (int) – size of element’s data
Raises:
  • ReadError – when not all the required bytes could be read
  • SizeError – if size is incorrect
Returns:

the read integer

Return type:

int

enzyme.parsers.ebml.readers.read_element_uinteger(stream, size)

Read the Element Data of type UINTEGER

Parameters:
  • stream – file-like object from which to read
  • size (int) – size of element’s data
Raises:
  • ReadError – when not all the required bytes could be read
  • SizeError – if size is incorrect
Returns:

the read unsigned integer

Return type:

int

enzyme.parsers.ebml.readers.read_element_float(stream, size)

Read the Element Data of type FLOAT

Parameters:
  • stream – file-like object from which to read
  • size (int) – size of element’s data
Raises:
  • ReadError – when not all the required bytes could be read
  • SizeError – if size is incorrect
Returns:

the read float

Return type:

float

enzyme.parsers.ebml.readers.read_element_string(stream, size)

Read the Element Data of type STRING

Parameters:
  • stream – file-like object from which to read
  • size (int) – size of element’s data
Raises:
  • ReadError – when not all the required bytes could be read
  • SizeError – if size is incorrect
Returns:

the read ascii-decoded string

Return type:

unicode

enzyme.parsers.ebml.readers.read_element_unicode(stream, size)

Read the Element Data of type UNICODE

Parameters:
  • stream – file-like object from which to read
  • size (int) – size of element’s data
Raises:
  • ReadError – when not all the required bytes could be read
  • SizeError – if size is incorrect
Returns:

the read utf-8-decoded string

Return type:

unicode

enzyme.parsers.ebml.readers.read_element_date(stream, size)

Read the Element Data of type DATE

Parameters:
  • stream – file-like object from which to read
  • size (int) – size of element’s data
Raises:
  • ReadError – when not all the required bytes could be read
  • SizeError – if size is incorrect
Returns:

the read date

Return type:

datetime

enzyme.parsers.ebml.readers.read_element_binary(stream, size)

Read the Element Data of type BINARY

Parameters:
  • stream – file-like object from which to read
  • size (int) – size of element’s data
Raises:
  • ReadError – when not all the required bytes could be read
  • SizeError – if size is incorrect
Returns:

raw binary data

Return type:

bytes

Specifications

The XML specification for Matroska can be found here. It is included with enzyme and can be converted to the appropriate format with get_matroska_specs().

The appropriate format of the specs parameter for parse(), parse_element() and load() is {id: (type, name, level)}