Text from website appears as Gibberish instead of Hebrew

Question

I'm trying to get a string from a website. I use the requests module to send the GET request.

text = requests.get("http://example.com") #send GET requests to the website
print text.text #print the variable

However, for some reason, the text appears in Gibberish instead of Hebrew:

<div>
<p>×©×¨×ª</p>
</div>

Tough when I sniff the traffic with Fiddler or view the website in my browser, I see it in Hebrew:

<div>
<p>שרת</p>
</div>

By the way, the html code contains meta-tag that defines the encoding, which is utf-8. I tried to encode the text to utf-8 but it still in gibberish. I tried to deocde it using utf-8, but it throws UnicodeEncodeError exception. I declared that I'm using utf-8 in the first line of the script. Moreover, the problem is also happend when I send the request with the built in urllib module.

I read the Unicode HOWTO, but still couldn't manage to fix it. I also read many threads here (both about the UnicodeEncodeError exception and about why hebrew turns into gibberish in Python) but I still couldn't manage to fix it up.

I'm using Python 2.7.9 on a Windows machine. I'm running my script in the Python IDLE.

Thanks in advance.

Ignacio Vazquez-Abrams · Accepted Answer

The server isn't declaring the encoding correctly.

>>> print u'×©×¨×ª'.encode('latin-1').decode('utf-8')
שרת

Set text.encoding before accessing text.text.

text = requests.get("http://example.com") #send GET requests to the website
text.encoding = 'utf-8' # Correct the page encoding
print text.text #print the variable

Text from website appears as Gibberish instead of Hebrew

Tags:

python

encoding

unicode

utf-8

decoding

ohad987

1 Answers

Ignacio Vazquez-Abrams

Recent Activity

Donate For Us

Text from website appears as Gibberish instead of Hebrew

Tags:

python

encoding

unicode

utf-8

decoding

ohad987

1 Answers

Ignacio Vazquez-Abrams

Related questions

Recent Activity

Donate For Us