I have a string with miss encoding <code>»Æ¹ûÊ÷</code>. On http://2cyr.com/decode/?lang=en website, you can encode it with <code>gb2312</code> then decode it with <code>iso8859</code> so to display it correctly. In C#, there's a function called Encoding.Convert, which can help you convert convert the bytes from one encoding to the other. In process is straight forward: <pre class="prettyprint"><code>encode the string into bytesA, using gb2312 encoder Encoding.Convert bytesA from gb2312 encoding to iso8859 encoding decode the bytes using iso8859 encoder </code></pre> In Python, I have tried all kinds of encoding and decoding methods I can think of, but no one can help me convert the given string to the correct codecs that can be displayed correctly.

Your data is UTF-8 encoded GB2312, at least as pasted into my UTF-8 configured terminal window: <pre class="prettyprint"><code>>>> data = '»Æ¹ûÊ÷' >>> data.decode('utf8').encode('latin1').decode('gb2312') u'\u9ec4\u679c\u6811' >>> print _ 黄果树 </code></pre> Encoding to Latin 1 lets us interpret characters as bytes to fix the encoding. Rule of thumb: whenever you have double-encoded data, undo the extra 'layer' of encoding by decoding to Unicode using that codec, then encoding again with Latin-1 to get bytes again.

How to convert encoding in Python?

Tags:

python

encoding

I have a string with miss encoding »Æ¹ûÊ÷. On http://2cyr.com/decode/?lang=en website, you can encode it with gb2312 then decode it with iso8859 so to display it correctly.

In C#, there's a function called Encoding.Convert, which can help you convert convert the bytes from one encoding to the other. In process is straight forward:

Click to copy

encode the string into bytesA, using gb2312 encoder
Encoding.Convert bytesA from gb2312 encoding to iso8859 encoding
decode the bytes using iso8859 encoder

In Python, I have tried all kinds of encoding and decoding methods I can think of, but no one can help me convert the given string to the correct codecs that can be displayed correctly.

621

asked Jan 04 '14 14:01

David S.

1 Answers

Your data is UTF-8 encoded GB2312, at least as pasted into my UTF-8 configured terminal window:

Click to copy

>>> data = '»Æ¹ûÊ÷'
>>> data.decode('utf8').encode('latin1').decode('gb2312')
u'\u9ec4\u679c\u6811'
>>> print _
黄果树

Encoding to Latin 1 lets us interpret characters as bytes to fix the encoding.

Rule of thumb: whenever you have double-encoded data, undo the extra 'layer' of encoding by decoding to Unicode using that codec, then encoding again with Latin-1 to get bytes again.

131

answered Sep 20 '22 01:09

Martijn Pieters

Related questions
                            
                                Scikit Learn HMM training with set of observation sequences
                            
                                Connect to Dynamics CRM with python suds
                            
                                Scripting a command line psql command in python
                            
                                AutoIt to Python encrypt/decrypt
                            
                                Python Social Auth NotAllowedToDisconnect at /disconnect/facebook/1/
                            
                                Python Function Capsules
                            
                                django templates : how to expand a variable into the string argument for the built-in tag `url`
                            
                                Why do I keep getting this big error in python. Traceback (most recent call last)..... and AttributeError
                            
                                Inserting a unicode character using .join()
                            
                                Good coding style: use temporary variable for list length or not? [closed]
                            
                                Find the cursor's current position in Python turtle
                            
                                South: how to revert migrations in production server?
                            
                                How to run Python 3 in Sublime 2 REPL Mac
                            
                                Bad practice to have ORMs with NoSQL stores?
                            
                                Trying to print out the decision tree for a forest from scikit-learn ensemble
                            
                                android.Android() on QPython error
                            
                                Twisted inlineCallbacks and remote generators
                            
                                Styling with classes in Pyside + Python
                            
                                Parse dates with any separator using python's strptime
                            
                                How to find unique values in a large JSON file?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With