I'm using mutagen to convert ID3 tags data from CP-1251/CP-1252 to UTF-8. In Linux there is no problem. But on Windows, calling <code>SetValue()</code> on a wx.TextCtrl produces the error: <blockquote> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128) </blockquote> The original string (assumed to be CP-1251 encoded) that I'm pulling from mutagen is: <pre class="prettyprint"><code>u'\xc1\xe5\xeb\xe0\xff \xff\xe1\xeb\xfb\xed\xff \xe3\xf0\xee\xec\xf3' </code></pre> I've tried converting this to UTF-8: <pre class="prettyprint"><code>dd = d.decode('utf-8') </code></pre> ...and even changing the default encoding from ASCII to UTF-8: <pre class="prettyprint"><code>sys.setdefaultencoding('utf-8') </code></pre> ...But I get the same error.

If you know for sure that you have cp1251 in your input, you can do <pre class="prettyprint"><code>d.decode('cp1251').encode('utf8') </code></pre>

How to convert a string from CP-1251 to UTF-8?

Tags:

cp1251

I'm using mutagen to convert ID3 tags data from CP-1251/CP-1252 to UTF-8. In Linux there is no problem. But on Windows, calling SetValue() on a wx.TextCtrl produces the error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

The original string (assumed to be CP-1251 encoded) that I'm pulling from mutagen is:

u'\xc1\xe5\xeb\xe0\xff \xff\xe1\xeb\xfb\xed\xff \xe3\xf0\xee\xec\xf3'

I've tried converting this to UTF-8:

dd = d.decode('utf-8')

...and even changing the default encoding from ASCII to UTF-8:

sys.setdefaultencoding('utf-8')

...But I get the same error.

594

asked Sep 26 '11 12:09

4 Answers

If you know for sure that you have cp1251 in your input, you can do

d.decode('cp1251').encode('utf8')

162

answered Oct 13 '22 23:10

If d is a correct Unicode string, then d.encode('utf-8') yields an encoded UTF-8 bytestring. Don't test it by printing, though, it might be that it just doesn't display properly because of the codepage shenanigans.

answered Oct 14 '22 00:10

Cat Plus Plus

I'd rather add a comment to Александр Степаненко answer but my reputation doesn't yet allow it. I had similar problem of converting MP3 tags from CP-1251 to UTF-8 and the solution of encode/decode/encode worked for me. Except for I had to replace first encoding with "latin-1", which essentially converts Unicode string into byte sequence without real encoding:

print text.encode("latin-1").decode('cp1251').encode('utf8')

and for saving back using for example mutagen it doesn't need to be encoded:

audio["title"] = title.encode("latin-1").decode('cp1251')

answered Oct 14 '22 00:10

Andrey

Related questions
                            
                                Overriding 'to boolean' operator in python?
                            
                                How to know the encoding of a file in Python? [duplicate]
                            
                                Permission to view, but not to change! - Django
                            
                                paramiko Incompatible ssh peer (no acceptable kex algorithm)
                            
                                Read slave, read-write master setup
                            
                                How to get list of objects with unique attribute
                            
                                How to access List elements
                            
                                How to launch EC2 instance with Boto, specifying size of EBS?
                            
                                itertools.accumulate() versus functools.reduce()
                            
                                How to show multiple images in one figure?
                            
                                matplotlib hatched fill_between without edges?
                            
                                Python modules with submodules and functions
                            
                                Limiting/throttling the rate of HTTP requests in GRequests
                            
                                Why does Python handle '1 is 1**2' differently from '1000 is 10**3'?
                            
                                python - RGB matrix of an image
                            
                                Downloading a file from google cloud storage inside a folder
                            
                                How to get default blue colour of matplotlib.pyplot.scatter?
                            
                                What is the default weight initializer in Keras?
                            
                                How to hash a large object (dataset) in Python?
                            
                                When will Django support Python 3.x?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to convert a string from CP-1251 to UTF-8?

Tags:

python

utf-8

wxpython

cp1251

jsnjack

People also ask

4 Answers

Johannes Charra

Tim Pietzcker

Cat Plus Plus

Andrey

Recent Activity

Donate For Us