I am trying to encode and decode the Hebrew string "שלום". However, after encoding, I get gibberish:
>>> word = "שלום"
>>> word = word.decode('UTF-8')
>>> word
u'\u05e9\u05dc\u05d5\u05dd'
>>> print word
שלום
>>> word = word.encode('UTF-8')
>>> word
'\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d'
>>> print word
׳©׳׳•׳
How should I do it properly?
decode() is a method specified in Strings in Python 2. This method is used to convert from one encoding scheme, in which argument string is encoded to the desired encoding scheme. This works opposite to the encode. It accepts the encoding of the encoding string to decode it and returns the original string.
To represent a unicode string as a string of bytes is known as encoding. To convert a string of bytes to a unicode string is known as decoding.
You can use type or isinstance . In Python 2, str is just a sequence of bytes. Python doesn't know what its encoding is. The unicode type is the safer way to store text.
You'll have to make sure you have the right encoding in your environment (shell or script). If you're using a script include the following:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
To make sure your environment knows you're using UTF-8. You may find that your shell terminal will accept only ASCII, so make sure it is able to support UTF-8.
>>> word = "שלום"
>>> word
'\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d'
>>> print word
שלום
>>> word = word.decode('UTF-8')
>>> word
u'\u05e9\u05dc\u05d5\u05dd'
>>> print word
שלום
>>> word = word.encode('UTF-8')
>>> word
'\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d'
>>> print word
שלום
>>>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With