Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python BeautifulSoup: How get text from self-closing tags

Im trying to parse an the contents of an evernote checklist using beautifulsoup. But when I call the html parser on the contents, it keeps correcting the self-closing tags (en-todo), so when I try to get the text of the en-todo tags, its either blank.

note_body = '<en-todo checked="true" />window caulk<en-todo />cake pan<en-todo />cake mix<en-todo />salad mix<en-todo checked="true"/>painters tape<br />'

import re
from bs4 import BeautifulSoup 
soup = BeautifulSoup(note_body, 'html.parser')
checklist_items = soup.find_all('en-todo')
print checklist_items

The above code returns just the tags, without any of the text.

[<en-todo checked="true"></en-todo>, <en-todo></en-todo>, <en-todo></en-todo>, <en-todo></en-todo>, <en-todo checked="true"></en-todo>]

1 Answers

You need to get the text messages that aren't enclosed in a tag!

You need to use tag.next_sibling!

>>> [each.next_sibling for each in checklist_items]
[u'window caulk', u'cake pan', u'cake mix', u'salad mix', u'painters tape']
like image 158
Keerthana Prabhakaran Avatar answered Mar 08 '26 22:03

Keerthana Prabhakaran



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!