BeautifulSoup

Question

I would like to get all text between two tags:

<div class="lead">I DONT WANT this</div>

#many different tags - p, table, h2 including text that I want

<div class="image">...</div>

I started this way:

url = "http://......."
req = urllib.request.Request(url)
source = urllib.request.urlopen(req)
soup = BeautifulSoup(source, 'lxml')

start = soup.find('div', {'class': 'lead'})
end = soup.find('div', {'class': 'image'})

And I have no idea what to do next

matsbauer · Accepted Answer

Try this code, it let's the parser start at class lead and exits the programm when hitting class image and prints all available tags, this can be changed to printing entire code:

html = u""
for tag in soup.find("div", { "class" : "lead" }).next_siblings:
    if soup.find("div", { "class" : "image" }) == tag:
        break
    else:
        html += unicode(tag)
print html

BeautifulSoup - How to get all text between two different tags?

Tags:

python

Alek SZ

1 Answers

matsbauer

Recent Activity

Donate For Us