Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect charset and convert to utf-8 in Python? [duplicate]

Is there any universal method to detect string charset? I user IPTC tags and have no known encoding. I need to detect it and then change them to utf-8.

Anybody can help?

like image 771
robos85 Avatar asked Jul 15 '11 13:07

robos85


2 Answers

You want to use chardet, an encoding detector

like image 119
Ignacio Vazquez-Abrams Avatar answered Sep 29 '22 04:09

Ignacio Vazquez-Abrams


It's a bit late, but there is also another solution: try to use pyicu.

An example:

import icu def convert_encoding(data, new_coding='UTF-8'):     coding = icu.CharsetDetector(data).detect().getName()     if new_coding.upper() != coding.upper():         data = unicode(data, coding).encode(new_coding)     return data 
like image 31
parkouss Avatar answered Sep 29 '22 06:09

parkouss