Parsers¶
A parser extract structured information as a tree from a container as a file-like object.
It does the type conversion when explicit but does not interpret anything else.
Parsers can raise a ParserError
.
EBML¶
EBML (Extensible Binary Meta Language) is used by Matroska and WebM.
Element types¶
-
enzyme.parsers.ebml.
INTEGER
¶ Signed integer element type
-
enzyme.parsers.ebml.
UINTEGER
¶ Unsigned integer element type
-
enzyme.parsers.ebml.
FLOAT
¶ Float element type
-
enzyme.parsers.ebml.
STRING
¶ ASCII-encoded string element type
-
enzyme.parsers.ebml.
UNICODE
¶ UTF-8-encoded string element type
-
enzyme.parsers.ebml.
DATE
¶ Date element type
-
enzyme.parsers.ebml.
BINARY
¶ Binary element type
-
enzyme.parsers.ebml.
MASTER
¶ Container element type
Main interface¶
-
enzyme.parsers.ebml.
SPEC_TYPES
¶ Specification types to Element types mapping
-
enzyme.parsers.ebml.
READERS
¶ Element types to reader functions mapping. See Readers
You can override a reader to use one of your choice here:
>>> def my_binary_reader(stream, size): ... data = stream.read(size) ... return data >>> READERS[BINARY] = my_binary_reader
-
class
enzyme.parsers.ebml.
Element
(id=None, type=None, name=None, level=None, position=None, size=None, data=None)¶ Base object of EBML
Parameters: - id (int) – id of the element, best represented as hexadecimal (0x18538067 for Matroska Segment element)
- type (
INTEGER
,UINTEGER
,FLOAT
,STRING
,UNICODE
,DATE
,MASTER
orBINARY
) – type of the element - name (string) – name of the element
- level (int) – level of the element
- position (int) – position of element’s data
- size (int) – size of element’s data
- data – data as read by the corresponding
READERS
-
class
enzyme.parsers.ebml.
MasterElement
(id=None, name=None, level=None, position=None, size=None, data=None)¶ Element of type
MASTER
that has a list ofElement
as its dataParameters: - id (int) – id of the element, best represented as hexadecimal (0x18538067 for Matroska Segment element)
- name (string) – name of the element
- level (int) – level of the element
- position (int) – position of element’s data
- size (int) – size of element’s data
- data (list of
Element
) – child elements
MasterElement
implements some magic methods to ease manipulation. Thus, a MasterElement supports the in keyword to test for the presence of a child element by its name and gives access to it with a container getter:>>> ebml_element = parse(open('test1.mkv', 'rb'), get_matroska_specs())[0] >>> 'EBMLVersion' in ebml_element False >>> 'DocType' in ebml_element True >>> ebml_element['DocType'] Element(DocType, u'matroska')
-
load
(stream, specs, ignore_element_types=None, ignore_element_names=None, max_level=None)¶ Load children
Elements
with level lower or equal to the max_level from the stream according to the specsParameters: - stream – file-like object from which to read
- specs (dict) – see Specifications
- max_level (int) – maximum level for children elements
- ignore_element_types (list) – list of element types to ignore
- ignore_element_names (list) – list of element names to ignore
- max_level – maximum level of elements
-
get
(name, default=None)¶ Convenience method for
master_element[name].data if name in master_element else default
Parameters: - name (string) – the name of the child to get
- default – default value if name is not in the
MasterElement
Returns: the data of the child
Element
or default
-
enzyme.parsers.ebml.
parse
(stream, specs, size=None, ignore_element_types=None, ignore_element_names=None, max_level=None)¶ Parse a stream for size bytes according to the specs
Parameters: - stream – file-like object from which to read
- size (int or None) – maximum number of bytes to read, None to read all the stream
- specs (dict) – see Specifications
- ignore_element_types (list) – list of element types to ignore
- ignore_element_names (list) – list of element names to ignore
- max_level (int) – maximum level of elements
Returns: parsed data as a tree of
Element
Return type: list
Note
If size is reached in a middle of an element, reading will continue until the element is fully parsed.
-
enzyme.parsers.ebml.
parse_element
(stream, specs, load_children=False, ignore_element_types=None, ignore_element_names=None, max_level=None)¶ Extract a single
Element
from the stream according to the specsParameters: - stream – file-like object from which to read
- specs (dict) – see Specifications
- load_children (bool) – load children elements if the parsed element is a
MasterElement
- ignore_element_types (list) – list of element types to ignore
- ignore_element_names (list) – list of element names to ignore
- max_level (int) – maximum level for children elements
Returns: the parsed element
Return type:
-
enzyme.parsers.ebml.
get_matroska_specs
(webm_only=False)¶ Get the Matroska specs
Parameters: webm_only (bool) – load only WebM specs Returns: the specs in the appropriate format. See Specifications Return type: dict
Readers¶
-
enzyme.parsers.ebml.readers.
read_element_id
(stream)¶ Read the Element ID
Parameters: stream – file-like object from which to read Raises: ReadError – when not all the required bytes could be read Returns: the id of the element Return type: int
-
enzyme.parsers.ebml.readers.
read_element_size
(stream)¶ Read the Element Size
Parameters: stream – file-like object from which to read Raises: ReadError – when not all the required bytes could be read Returns: the size of element’s data Return type: int
-
enzyme.parsers.ebml.readers.
read_element_integer
(stream, size)¶ Read the Element Data of type
INTEGER
Parameters: - stream – file-like object from which to read
- size (int) – size of element’s data
Raises: - ReadError – when not all the required bytes could be read
- SizeError – if size is incorrect
Returns: the read integer
Return type: int
-
enzyme.parsers.ebml.readers.
read_element_uinteger
(stream, size)¶ Read the Element Data of type
UINTEGER
Parameters: - stream – file-like object from which to read
- size (int) – size of element’s data
Raises: - ReadError – when not all the required bytes could be read
- SizeError – if size is incorrect
Returns: the read unsigned integer
Return type: int
-
enzyme.parsers.ebml.readers.
read_element_float
(stream, size)¶ Read the Element Data of type
FLOAT
Parameters: - stream – file-like object from which to read
- size (int) – size of element’s data
Raises: - ReadError – when not all the required bytes could be read
- SizeError – if size is incorrect
Returns: the read float
Return type: float
-
enzyme.parsers.ebml.readers.
read_element_string
(stream, size)¶ Read the Element Data of type
STRING
Parameters: - stream – file-like object from which to read
- size (int) – size of element’s data
Raises: - ReadError – when not all the required bytes could be read
- SizeError – if size is incorrect
Returns: the read ascii-decoded string
Return type: unicode
-
enzyme.parsers.ebml.readers.
read_element_unicode
(stream, size)¶ Read the Element Data of type
UNICODE
Parameters: - stream – file-like object from which to read
- size (int) – size of element’s data
Raises: - ReadError – when not all the required bytes could be read
- SizeError – if size is incorrect
Returns: the read utf-8-decoded string
Return type: unicode
-
enzyme.parsers.ebml.readers.
read_element_date
(stream, size)¶ Read the Element Data of type
DATE
Parameters: - stream – file-like object from which to read
- size (int) – size of element’s data
Raises: - ReadError – when not all the required bytes could be read
- SizeError – if size is incorrect
Returns: the read date
Return type: datetime
-
enzyme.parsers.ebml.readers.
read_element_binary
(stream, size)¶ Read the Element Data of type
BINARY
Parameters: - stream – file-like object from which to read
- size (int) – size of element’s data
Raises: - ReadError – when not all the required bytes could be read
- SizeError – if size is incorrect
Returns: raw binary data
Return type: bytes
Specifications¶
The XML specification for Matroska can be found here.
It is included with enzyme and can be converted to the appropriate format with get_matroska_specs()
.
The appropriate format of the specs parameter for parse()
, parse_element()
and load()
is {id: (type, name, level)}