i have a text with windows-1256 encoding. now i want to convert text from arabic(windows-1256) to utf-8 sample text : <pre class="prettyprint"><code>Óæí Ïæã ÈíåÞí </code></pre> result : <pre class="prettyprint"><code>سوي دوم بيهقي </code></pre> i use this code to decode and encod to utf-8 <pre class="prettyprint"><code># -*- coding: utf-8 -*- data = "Óæí Ïæã ÈíåÞí" print data.decode("windows-1256", "replace") print data.encode("windows-1256") </code></pre> that code return this result: <pre class="prettyprint"><code>أ“أ¦أ أڈأ¦أ£ أ&circ;أأ¥أ&zwj;أ Traceback (most recent call last): File "mohmal2.py", line 5, in <module> print data.encode("windows-1256") File "/usr/lib/python2.7/encodings/cp1256.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_table) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128) </code></pre> i found a site that can convert this text: http://www.iosart.com

It looks like you have accidentally decoded the input as Windows-1252. <pre class="prettyprint"><code>>>> "Óæí Ïæã ÈíåÞí".encode('cp1252').decode('cp1256') 'سوي دوم بيهقي' </code></pre>

I would like to add to @josh-lee answer the case for python2. If you are using python 2, add unicode prefix <code>u</code>. <pre class="prettyprint"><code>>>> u"Óæí Ïæã ÈíåÞí".encode('cp1252').decode('cp1256') u'\u0633\u0648\u064a \u062f\u0648\u0645 \u0628\u064a\u0647\u0642\u064a' >>> print _ سوي دوم بيهقي </code></pre>

python arabic encoding issue

Tags:

python

encoding

i have a text with windows-1256 encoding. now i want to convert text from arabic(windows-1256) to utf-8

sample text :

Óæí Ïæã ÈíåÞí

result :

سوي دوم بيهقي

i use this code to decode and encod to utf-8

# -*- coding: utf-8 -*-

data = "Óæí Ïæã ÈíåÞí"
print data.decode("windows-1256", "replace")
print data.encode("windows-1256")

that code return this result:

أ“أ¦أ أڈأ¦أ£ أˆأأ¥أ‍أ
Traceback (most recent call last):
  File "mohmal2.py", line 5, in <module>
    print data.encode("windows-1256")
  File "/usr/lib/python2.7/encodings/cp1256.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

i found a site that can convert this text:

http://www.iosart.com

732

asked Apr 19 '17 13:04

Amir Mohsen

2 Answers

It looks like you have accidentally decoded the input as Windows-1252.

>>> "Óæí Ïæã ÈíåÞí".encode('cp1252').decode('cp1256')
'سوي دوم بيهقي'

112

answered Oct 03 '22 02:10

Josh Lee

I would like to add to @josh-lee answer the case for python2.
If you are using python 2, add unicode prefix u.

>>> u"Óæí Ïæã ÈíåÞí".encode('cp1252').decode('cp1256')
u'\u0633\u0648\u064a \u062f\u0648\u0645 \u0628\u064a\u0647\u0642\u064a'
>>> print _
سوي دوم بيهقي

answered Oct 03 '22 01:10

سليمان السهمي Suleyman Sahmi

Related questions
                            
                                Log file to Pandas Dataframe
                            
                                Optional command line arguments
                            
                                Prevent pandas.read_csv from inferring dtypes
                            
                                Pandas str.count
                            
                                Segment tree implementation in Python
                            
                                More efficient way to clean a column of strings and add a new column
                            
                                How to serve an image from google cloud storage using python flask
                            
                                Pandas: create a dataframe from 2D numpy arrays preserving their sequential order
                            
                                Divide list to multiple lists based on elements value
                            
                                Pandas: Dataframe.Drop - ValueError: labels ['id'] not contained in axis
                            
                                Anaconda "failed to create process"
                            
                                Yes/No prompt in Python3 using strtobool
                            
                                How to optimize MAPE code in Python?
                            
                                Non-blocking requests in Sanic framework
                            
                                Don't understand cause of "IndexError: tuple index out of range" when formatting string
                            
                                How to create groups and assign permission during project setup in django?
                            
                                NumPy: calculate cumulative median
                            
                                Prevent deletion of parent row if it's child will be orphaned in SQLAlchemy
                            
                                How should I pass my s3 credentials to Python lambda function on AWS?
                            
                                Tensorflow dynamic RNN (LSTM): how to format input?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With