I can parse the field that I need from a website with this code block:
response = requests.get(index_url)
soup = bs4.BeautifulSoup(response.text, "lxml")
poem = soup.select('div.siir p[id^=siir]')
print poem
But it prints with HTML tags. I'm trying to use get_text() function. When I try to use like this:
print poem.get_text()
I get this error:
AttributeError: 'list' object has no attribute 'get_text'
I also tried to use like this:
poem = soup.select('div.siir p[id^=siir]').get_text()
I get same error again. How can I eliminate the HTML tags after I parse the correct field?
soup.select() always returns a list of elements, not just one element. Call get_text() on each element in turn:
for element in poem:
    print element.get_text()
If you expected just one element, then extract it with indexing:
print poem[0].get_text()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With