Trying to use Hachoir to retrieve metadata from a video file. Working reasonably well except when using 'get' or similar to return the width and height values.
I assumed it would be:
metadata.get('width')
But this throws an error (object does not have 'width' property).
When I run the following:
for data in sorted(metadata):
if len(data.values ) > 0:
print data.key, data.values[0].value
All that is returned is the information from the "Common" Group.
When I use the:
metadata.exportPlaintext
... the information from "Common", "Video stream" and "Audio stream" is returned. I could simply parse over the resulting 'text' item and strip out the height and width values, but I would rather try to do it properly using metadata.get('width') or similar.
Looking at the source code, I thought I could use the following:
for key, metadata in metadata.__groups.iteritems():
To iterate through the ._groups in the metadata, but it then throws a "'AsfMetadata' object has no attribute '_groups' - which I'm sure shouldn't be the case as I thought 'AsfMetadata' was a subclass of MultipleMetadata() which does have such a variable.
Probably missing something quite obvious.
This seems less straightforward for a WMV file. I have turned the metadata for such videos into a defaultdict
, and it is more straightforward to get the image width now:
from collections import defaultdict
from pprint import pprint
from hachoir_metadata import metadata
from hachoir_core.cmd_line import unicodeFilename
from hachoir_parser import createParser
# using this example http://archive.org/details/WorkToFishtestwmv
filename = './test_wmv.wmv'
filename, realname = unicodeFilename(filename), filename
parser = createParser(filename)
# See what keys you can extract
for k,v in metadata.extractMetadata(parser)._Metadata__data.iteritems():
if v.values:
print v.key, v.values[0].value
# Turn the tags into a defaultdict
metalist = metadata.extractMetadata(parser).exportPlaintext()
meta = defaultdict(defaultdict)
for item in metalist:
if item.endswith(':'):
k = item[:-1]
else:
tag, value = item.split(': ')
tag = tag[2:]
meta[k][tag] = value
print meta['Video stream #1']['Image width'] # 320 pixels
To get width x height
from the first top-level metadata group that has the size info in the media file without accessing private attributes and without parsing the text output, you could use file_metadata.iterGroups()
:
#!/usr/bin/env python
import sys
from itertools import chain
# $ pip install hachoir-{core,parser,metadata}
from hachoir_core.cmd_line import unicodeFilename
from hachoir_metadata import extractMetadata
from hachoir_parser import createParser
file_metadata = extractMetadata(createParser(unicodeFilename(sys.argv[1])))
it = chain([file_metadata], file_metadata.iterGroups())
print("%sx%s" % next((metadata.get('width'), metadata.get('height'))
for metadata in it
if metadata.has('width') and metadata.get('height')))
To convert metadata
into a dictionary (non-recursively, i.e., iterate groups manually if needed):
def metadata_as_dict(metadata):
return {item.key: (len(item.values) > 1 and
[v.value for v in item.values] or
item.values[0].value)
for item in metadata if item.values}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With