HTML parsing with BeautifulSoup 4 and Python

Question

I am trying to parse the resultlist of http://mobile.de.

First I tried it with the HTMLParser Class, but I got an error: HTMLParser.HTMLParseError: EOF in middle of construct.

So I tried it with BeautifulSoup 4 which is better for non-valid websites, but the <div> I’m Searching for isn’t accessible, and I can’t tell if it’s my fault or the website’s.

from bs4 import BeautifulSoup
    import urllib
    import socket

    searchurl = "http://suchen.mobile.de/auto/search.html?scopeId=C&isSearchRequest=true&sortOption.sortBy=price.consumerGrossEuro"
    f = urllib.urlopen(searchurl)
    html = f.read()
    soup = BeautifulSoup(html)

    for link in soup.find_all("div","listEntry "):
        print link

listEntry is the <div> with the result of the cars. But it seems that he isn’t parsing <form id="parkAndCompareVehicle" name="parkAndCompareVehicle" action="">. I can’t find the form in the soupobject.

Where is the fault?

gorlum0 · Accepted Answer

It should be something like:

for link in soup.findAll('div', {'class': 'listEntry '}):
    print link

Attributes are specified in a dictionary -findAll(self, name=None, attrs={}, recursive=True, text=None, limit=None, **kwargs)

==========

upd. Sorry it seems in bs4 you can do that way as well.

As for the fault, the form you're looking for is not in the results because it encloses listEntries as far as I can see.

What's wrong with that:

form = soup.find('form', id='parkAndCompareVehicle')
print len(form.find_all('div', 'listEntry '))

HTML parsing with BeautifulSoup 4 and Python

Tags:

python

html

html-parsing

beautifulsoup

user1010775

Video Answer

1 Answers

gorlum0

Recent Activity

Donate For Us

HTML parsing with BeautifulSoup 4 and Python

Tags:

python

html

html-parsing

beautifulsoup

user1010775

Video Answer

1 Answers

gorlum0

Related questions

Recent Activity

Donate For Us