Is there any universal method to detect string charset? I user IPTC tags and have no known encoding. I need to detect it and then change them to utf-8.
Anybody can help?
You want to use chardet
, an encoding detector
It's a bit late, but there is also another solution: try to use pyicu.
An example:
import icu def convert_encoding(data, new_coding='UTF-8'): coding = icu.CharsetDetector(data).detect().getName() if new_coding.upper() != coding.upper(): data = unicode(data, coding).encode(new_coding) return data
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With