Issue scraping with Beautiful Soup

Question

I've been scraping websites before using this same technique. But with this website it seems to not work.

import urllib2
from BeautifulSoup import BeautifulSoup
url = "http://www.weatheronline.co.uk/weather/maps/current?LANG=en&DATE=1354104000&CONT=euro&LAND=UK&KEY=UK&SORT=1&INT=06&TYP=sonne&ART=tabelle&RUBRIK=akt&R=310&CEL=C"
page=urllib2.urlopen(url).read()
soup = BeautifulSoup(page)
print soup

In the output should be the content of the webpage but instead I am just getting this:

GIF89a (it follows also some symbols I can't copy here)

Any ideas of what the problem is and how should I proceed.

Abhijit · Accepted Answer

but I want to know why I am getting a gif accesing the url like that and when I access it via my browser I get the website perfectly.

because these guys are smart and don't want their website to be accessed outside a web browser. What you need to do is to fake a known browser by adding User-agent to the header. Here is a modified example that will work

>>> import urllib2
>>> opener = urllib2.build_opener()
>>> opener.addheaders = [('User-agent', 'Mozilla/5.0')]
>>> url = "http://www.weatheronline.co.uk/weather/maps/current?LANG=en&DATE=1354104000&CONT=euro&LAND=UK&KEY=UK&SORT=1&INT=06&TYP=sonne&ART=tabelle&RUBRIK=akt&R=310&CEL=C"
>>> response = opener.open(url)
>>> page = response.read()
>>> from BeautifulSoup import BeautifulSoup
>>> soup = BeautifulSoup(page)

Issue scraping with Beautiful Soup

Tags:

python

beautifulsoup

urllib2

Julio

1 Answers

Abhijit

Recent Activity

Donate For Us

Issue scraping with Beautiful Soup

Tags:

python

beautifulsoup

urllib2

Julio

1 Answers

Abhijit

Related questions

Recent Activity

Donate For Us