Parsers¶

A parser extract structured information as a tree from a container as a file-like object. It does the type conversion when explicit but does not interpret anything else. Parsers can raise a ParserError.

EBML¶

EBML (Extensible Binary Meta Language) is used by Matroska and WebM.

Element types¶

enzyme.parsers.ebml.INTEGER¶: Signed integer element type

enzyme.parsers.ebml.UINTEGER¶: Unsigned integer element type

enzyme.parsers.ebml.FLOAT¶: Float element type

enzyme.parsers.ebml.STRING¶: ASCII-encoded string element type

enzyme.parsers.ebml.UNICODE¶: UTF-8-encoded string element type

enzyme.parsers.ebml.DATE¶: Date element type

enzyme.parsers.ebml.BINARY¶: Binary element type

enzyme.parsers.ebml.MASTER¶: Container element type

Main interface¶

enzyme.parsers.ebml.SPEC_TYPES¶: Specification types to Element types mapping

enzyme.parsers.ebml.READERS¶

Element types to reader functions mapping. See Readers

You can override a reader to use one of your choice here:

>>> def my_binary_reader(stream, size):
...     data = stream.read(size)
...     return data
>>> READERS[BINARY] = my_binary_reader

class enzyme.parsers.ebml.Element(id=None, type=None, name=None, level=None, position=None, size=None, data=None)¶

Base object of EBML

Parameters:

id (int) – id of the element, best represented as hexadecimal (0x18538067 for Matroska Segment element)
type (INTEGER, UINTEGER, FLOAT, STRING, UNICODE, DATE, MASTER or BINARY) – type of the element
name (string) – name of the element
level (int) – level of the element
position (int) – position of element’s data
size (int) – size of element’s data
data – data as read by the corresponding READERS

class enzyme.parsers.ebml.MasterElement(id=None, name=None, level=None, position=None, size=None, data=None)¶

Element of type MASTER that has a list of Element as its data

Parameters:

id (int) – id of the element, best represented as hexadecimal (0x18538067 for Matroska Segment element)
name (string) – name of the element
level (int) – level of the element
position (int) – position of element’s data
size (int) – size of element’s data
data (list of Element) – child elements

MasterElement implements some magic methods to ease manipulation. Thus, a MasterElement supports the in keyword to test for the presence of a child element by its name and gives access to it with a container getter:

>>> ebml_element = parse(open('test1.mkv', 'rb'), get_matroska_specs())[0]
>>> 'EBMLVersion' in ebml_element
False
>>> 'DocType' in ebml_element
True
>>> ebml_element['DocType']
Element(DocType, u'matroska')

load(stream, specs, ignore_element_types=None, ignore_element_names=None, max_level=None)¶

Load children Elements with level lower or equal to the max_level from the stream according to the specs

Parameters:

stream – file-like object from which to read
specs (dict) – see Specifications
max_level (int) – maximum level for children elements
ignore_element_types (list) – list of element types to ignore
ignore_element_names (list) – list of element names to ignore
max_level – maximum level of elements

get(name, default=None)¶

Convenience method for master_element[name].data if name in master_element else default

Parameters:

name (string) – the name of the child to get
default – default value if name is not in the MasterElement

Returns:

the data of the child Element or default

enzyme.parsers.ebml.parse(stream, specs, size=None, ignore_element_types=None, ignore_element_names=None, max_level=None)¶

Parse a stream for size bytes according to the specs

Parameters:

stream – file-like object from which to read
size (int or None) – maximum number of bytes to read, None to read all the stream
specs (dict) – see Specifications
ignore_element_types (list) – list of element types to ignore
ignore_element_names (list) – list of element names to ignore
max_level (int) – maximum level of elements

Returns:

parsed data as a tree of Element

Return type:

list

Note

If size is reached in a middle of an element, reading will continue until the element is fully parsed.

enzyme.parsers.ebml.parse_element(stream, specs, load_children=False, ignore_element_types=None, ignore_element_names=None, max_level=None)¶

Extract a single Element from the stream according to the specs

Parameters:

stream – file-like object from which to read
specs (dict) – see Specifications
load_children (bool) – load children elements if the parsed element is a MasterElement
ignore_element_types (list) – list of element types to ignore
ignore_element_names (list) – list of element names to ignore
max_level (int) – maximum level for children elements

Returns:

the parsed element

Return type:

Element

enzyme.parsers.ebml.get_matroska_specs(webm_only=False)¶

Get the Matroska specs

Parameters:: webm_only (bool) – load only WebM specs
Returns:: the specs in the appropriate format. See Specifications
Return type:: dict

Readers¶

enzyme.parsers.ebml.readers.read_element_id(stream)¶

Read the Element ID

Parameters:: stream – file-like object from which to read
Raises:: ReadError – when not all the required bytes could be read
Returns:: the id of the element
Return type:: int

enzyme.parsers.ebml.readers.read_element_size(stream)¶

Read the Element Size

Parameters:: stream – file-like object from which to read
Raises:: ReadError – when not all the required bytes could be read
Returns:: the size of element’s data
Return type:: int

enzyme.parsers.ebml.readers.read_element_integer(stream, size)¶

Read the Element Data of type INTEGER

Parameters:

stream – file-like object from which to read
size (int) – size of element’s data

Raises:

ReadError – when not all the required bytes could be read
SizeError – if size is incorrect

Returns:

the read integer

Return type:

int

enzyme.parsers.ebml.readers.read_element_uinteger(stream, size)¶

Read the Element Data of type UINTEGER

Parameters:

stream – file-like object from which to read
size (int) – size of element’s data

Raises:

ReadError – when not all the required bytes could be read
SizeError – if size is incorrect

Returns:

the read unsigned integer

Return type:

int

enzyme.parsers.ebml.readers.read_element_float(stream, size)¶

Read the Element Data of type FLOAT

Parameters:

stream – file-like object from which to read
size (int) – size of element’s data

Raises:

ReadError – when not all the required bytes could be read
SizeError – if size is incorrect

Returns:

the read float

Return type:

float

enzyme.parsers.ebml.readers.read_element_string(stream, size)¶

Read the Element Data of type STRING

Parameters:

stream – file-like object from which to read
size (int) – size of element’s data

Raises:

ReadError – when not all the required bytes could be read
SizeError – if size is incorrect

Returns:

the read ascii-decoded string

Return type:

unicode

enzyme.parsers.ebml.readers.read_element_unicode(stream, size)¶

Read the Element Data of type UNICODE

Parameters:

stream – file-like object from which to read
size (int) – size of element’s data

Raises:

ReadError – when not all the required bytes could be read
SizeError – if size is incorrect

Returns:

the read utf-8-decoded string

Return type:

unicode

enzyme.parsers.ebml.readers.read_element_date(stream, size)¶

Read the Element Data of type DATE

Parameters:

stream – file-like object from which to read
size (int) – size of element’s data

Raises:

ReadError – when not all the required bytes could be read
SizeError – if size is incorrect

Returns:

the read date

Return type:

datetime

enzyme.parsers.ebml.readers.read_element_binary(stream, size)¶

Read the Element Data of type BINARY

Parameters:

stream – file-like object from which to read
size (int) – size of element’s data

Raises:

ReadError – when not all the required bytes could be read
SizeError – if size is incorrect

Returns:

raw binary data

Return type:

bytes

Specifications¶

The XML specification for Matroska can be found here. It is included with enzyme and can be converted to the appropriate format with get_matroska_specs().

The appropriate format of the specs parameter for parse(), parse_element() and load() is {id: (type, name, level)}