I am a complete beginner in Python, and would like to start learning it by doing. Namely, I'd love to correct some EXIF information in a huge bunch of family photos I have. To start with, I want to just get this information out of JPEG files properly.
Some of them have a title written in EXIF. It can be obtained e.g. by
import pyexiv2
metadata = pyexiv2.ImageMetadata(filename)
metadata.read()
title = metadata['Exif.Image.XPTitle']
This far I've got. Now comes the problem. Some of the titles contain Cyrillic letters. If I do print title.human_value
I get for example
`Милой Мамуле от Майи, 11 ÑÐ½Ð²Ð°Ñ€Ñ 1944.`
while with print title
, it is
<Exif.Image.XPTitle [Byte] = 28 4 56 4 59 4 62 4 57 4 32 0 28 4 48 4 60 4 67 4 59 4 53 4 32 0 62 4 66 4 32 0 28 4 48 4 57 4 56 4 44 0 32 0 49 0 49 0 32 0 79 4 61 4 50 4 48 4 64 4 79 4 32 0 49 0 57 0 52 0 52 0 46 0 0 0>
The actual string I'd love to see is
Милой Мамуле от Майи, 11 января 1944.
It seems to be a unicode problem, but after trying already a dozen of different methods found here and elsewhere, I just cannot cope with it. Is it possible to see Russian letters in the console at all? I am using python(xy) on Windows 7 (English), so my IDE is spyder2. Just the default installation, to which I added pyexiv2. TIA!
The bytes are UTF-16.
In Python 3:
>>> b = [28, 4, 56, 4, 59, 4, 62, 4, 57, 4, 32, 0, 28, 4, 48, 4, 60, 4, 67, 4, 59, 4, 53, 4, 32, 0, 62, 4, 66, 4, 32, 0, 28, 4, 48, 4, 57, 4, 56, 4, 44, 0, 32, 0, 49, 0, 49, 0, 32, 0, 79, 4, 61, 4, 50, 4, 48, 4, 64, 4, 79, 4, 32, 0, 49, 0, 57, 0, 52, 0, 52, 0, 46, 0, 0, 0]
>>> bytes(b).decode("utf-16")
'Милой Мамуле от Майи, 11 января 1944.\x00'
In Python 2:
>>> b = [28, 4, 56, 4, 59, 4, 62, 4, 57, 4, 32, 0, 28, 4, 48, 4, 60, 4, 67, 4, 59, 4, 53, 4, 32, 0, 62, 4, 66, 4, 32, 0, 28, 4, 48, 4, 57, 4, 56, 4, 44, 0, 32, 0, 49, 0, 49, 0, 32, 0, 79, 4, 61, 4, 50, 4, 48, 4, 64, 4, 79, 4, 32, 0, 49, 0, 57, 0, 52, 0, 52, 0, 46, 0, 0, 0]
>>> "".join(chr(c) for c in b).decode("utf-16")
u'\u041c\u0438\u043b\u043e\u0439 \u041c\u0430\u043c\u0443\u043b\u0435 \u043e\u04
42 \u041c\u0430\u0439\u0438, 11 \u044f\u043d\u0432\u0430\u0440\u044f 1944.\x00'
I think the title.human_value data is in UTF-8, having already been decoded from the raw UTF-16 bytes of title.
In the python shell, running in a terminal window on OSX:
>>> # this should be the same as your title.human_value:
>>> print ''.join( chr(x) for x in [208, 156, 208, 184, 208,
187, 208, 190, 208, 185, 32, 208, 156, 208,
176, 208, 188, 209, 131, 208, 187, 208, 181,
32, 208, 190, 209, 130, 32, 208, 156, 208,
176, 208, 185, 208, 184, 44, 32, 49, 49, 32,
209, 143, 208, 189, 208, 178, 208, 176, 209,
128, 209, 143, 32, 49, 57, 52, 52, 46])
Милой Мамуле от Майи, 11 января 1944.
Your console may not support Cyrillic characters. You might try setting the font in the Command Prompt to "Lucida Console" -- a more modern vector font is more likely to support it correctly than the historical bitmapped fonts that cmd defaults to.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With