Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

decoding and encoding Hebrew string in Python

I am trying to encode and decode the Hebrew string "שלום". However, after encoding, I get gibberish:

>>> word = "שלום"
>>> word = word.decode('UTF-8')
>>> word
u'\u05e9\u05dc\u05d5\u05dd'
>>> print word
שלום
>>> word = word.encode('UTF-8')
>>> word
'\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d'
>>> print word
׳©׳׳•׳

How should I do it properly?

like image 853
user1767774 Avatar asked Apr 24 '15 15:04

user1767774


People also ask

How do you decode and encode a string in Python?

decode() is a method specified in Strings in Python 2. This method is used to convert from one encoding scheme, in which argument string is encoded to the desired encoding scheme. This works opposite to the encode. It accepts the encoding of the encoding string to decode it and returns the original string.

What is encoding and decoding in Python?

To represent a unicode string as a string of bytes is known as encoding. To convert a string of bytes to a unicode string is known as decoding.

How do you find the encoding of a string in Python?

You can use type or isinstance . In Python 2, str is just a sequence of bytes. Python doesn't know what its encoding is. The unicode type is the safer way to store text.


1 Answers

You'll have to make sure you have the right encoding in your environment (shell or script). If you're using a script include the following:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

To make sure your environment knows you're using UTF-8. You may find that your shell terminal will accept only ASCII, so make sure it is able to support UTF-8.

>>> word = "שלום"
>>> word
'\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d'
>>> print word
שלום
>>> word = word.decode('UTF-8')
>>> word
u'\u05e9\u05dc\u05d5\u05dd'
>>> print word
שלום
>>> word = word.encode('UTF-8')
>>> word
'\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d'
>>> print word
שלום
>>>
like image 128
jonhurlock Avatar answered Oct 30 '22 16:10

jonhurlock