I would like to get all text between two tags:
<div class="lead">I DONT WANT this</div>
#many different tags - p, table, h2 including text that I want
<div class="image">...</div>
I started this way:
url = "http://......."
req = urllib.request.Request(url)
source = urllib.request.urlopen(req)
soup = BeautifulSoup(source, 'lxml')
start = soup.find('div', {'class': 'lead'})
end = soup.find('div', {'class': 'image'})
And I have no idea what to do next
Try this code, it let's the parser start at class lead and exits the programm when hitting class image and prints all available tags, this can be changed to printing entire code:
html = u""
for tag in soup.find("div", { "class" : "lead" }).next_siblings:
if soup.find("div", { "class" : "image" }) == tag:
break
else:
html += unicode(tag)
print html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With