Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does python show me text in Chinese? [duplicate]

I am using requests and bs4 to scrape some data from a Chinese website that also has an English version. I wrote this to see if I get the right data:

import requests
from bs4 import BeautifulSoup

page = requests.get('http://dotamax.com/hero/rate/')
soup = BeautifulSoup(page.content, "lxml")
for i in soup.find_all('span'):
    print i.text

And I do, the only problem is that the text is in Chinese, although it is in English when I look at the page source. Why do I get Chinese instead of English. How to fix that?

like image 232
Chen Guevara Avatar asked Dec 15 '22 03:12

Chen Guevara


1 Answers

The website appears to check the GET request for an Accept-Language parameter. If the request doesn't have one, it shows the Chinese version. However, this is an easy fix - use headers as described in the requests documentation:

import requests
from bs4 import BeautifulSoup

headers = {'Accept-Language': 'en-US,en;q=0.8'}

page = requests.get('http://dotamax.com/hero/rate/', headers=headers)
soup = BeautifulSoup(page.content, "lxml")
for i in soup.find_all('span'):
    print i.text

produces:

Anti-Mage
Axe
Bane
Bloodseeker
Crystal Maiden
Drow Ranger
...

etc.

Usually when a request shows up differently in your browser and in the requests content, it has to do with the type of request and headers you're using. One really useful tip for web-scraping that I wish I had realized much earlier on is that if you hit F12 and go to the "Network" tab on Chrome or Firefox, you can get a lot of useful information that you can use for debugging:

enter image description here

like image 188
n1c9 Avatar answered Dec 31 '22 06:12

n1c9