Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BeautifulSoup - How to get all text between two different tags?

I would like to get all text between two tags:

<div class="lead">I DONT WANT this</div>

#many different tags - p, table, h2 including text that I want

<div class="image">...</div>

I started this way:

url = "http://......."
req = urllib.request.Request(url)
source = urllib.request.urlopen(req)
soup = BeautifulSoup(source, 'lxml')

start = soup.find('div', {'class': 'lead'})
end = soup.find('div', {'class': 'image'})

And I have no idea what to do next

like image 215
Alek SZ Avatar asked Jul 27 '17 09:07

Alek SZ


1 Answers

Try this code, it let's the parser start at class lead and exits the programm when hitting class image and prints all available tags, this can be changed to printing entire code:

html = u""
for tag in soup.find("div", { "class" : "lead" }).next_siblings:
    if soup.find("div", { "class" : "image" }) == tag:
        break
    else:
        html += unicode(tag)
print html
like image 193
matsbauer Avatar answered Oct 13 '22 06:10

matsbauer