I am trying to encode and decode the Hebrew string "שלום". However, after encoding, I get gibberish: <pre class="prettyprint"><code>>>> word = "שלום" >>> word = word.decode('UTF-8') >>> word u'\u05e9\u05dc\u05d5\u05dd' >>> print word שלום >>> word = word.encode('UTF-8') >>> word '\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d' >>> print word ׳©׳׳•׳ </code></pre> How should I do it properly?

You'll have to make sure you have the right encoding in your environment (shell or script). If you're using a script include the following: <pre class="prettyprint"><code>#!/usr/bin/env python # -*- coding: utf-8 -*- </code></pre> To make sure your environment knows you're using UTF-8. You may find that your shell terminal will accept only ASCII, so make sure it is able to support UTF-8. <pre class="prettyprint"><code>>>> word = "שלום" >>> word '\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d' >>> print word שלום >>> word = word.decode('UTF-8') >>> word u'\u05e9\u05dc\u05d5\u05dd' >>> print word שלום >>> word = word.encode('UTF-8') >>> word '\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d' >>> print word שלום >>> </code></pre>

decoding and encoding Hebrew string in Python

Tags:

python

python-unicode

hebrew

I am trying to encode and decode the Hebrew string "שלום". However, after encoding, I get gibberish:

>>> word = "שלום"
>>> word = word.decode('UTF-8')
>>> word
u'\u05e9\u05dc\u05d5\u05dd'
>>> print word
שלום
>>> word = word.encode('UTF-8')
>>> word
'\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d'
>>> print word
׳©׳׳•׳

How should I do it properly?

853

asked Apr 24 '15 15:04

user1767774

1 Answers

You'll have to make sure you have the right encoding in your environment (shell or script). If you're using a script include the following:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

To make sure your environment knows you're using UTF-8. You may find that your shell terminal will accept only ASCII, so make sure it is able to support UTF-8.

>>> word = "שלום"
>>> word
'\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d'
>>> print word
שלום
>>> word = word.decode('UTF-8')
>>> word
u'\u05e9\u05dc\u05d5\u05dd'
>>> print word
שלום
>>> word = word.encode('UTF-8')
>>> word
'\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d'
>>> print word
שלום
>>>

128

answered Oct 30 '22 16:10

jonhurlock

Related questions
                            
                                Can you view hdf5 files in pycharm?
                            
                                psycopg2.OperationalError: could not connect to server: Connection refused
                            
                                How to fill a Django form using test Client
                            
                                row level comparison of two tables
                            
                                How to use deepreload for autoreload in iPython
                            
                                Using os.walk to read files
                            
                                Why does fillna with median on dataframe still leaves Na/NaN in pandas?
                            
                                upgrade TLS 1.0 to TLS 1.2 on macOS
                            
                                Python: Loop through the elif section of an if statement
                            
                                List of list of tuples to pandas dataframe
                            
                                Keras: show loss for each label in a multi-label regression
                            
                                Collapsing dictionary by merging matching keys and key,value pairs
                            
                                Argparse: options for subparsers override main if both share parent
                            
                                Why is pandas.series.map so shockingly slow?
                            
                                How to write each JSON objects in a newline of JSON file? (Python)
                            
                                Plotly Dash API documentation
                            
                                Python: can numba work with arrays of strings in nopython mode?
                            
                                Enclose a variable in single quotes in Python
                            
                                IOError: [Errno 28] No space left on device while installing TensorFlow
                            
                                How can I split this comma-delimited string in Python? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With