Python BeautifulSoup: How get text from self-closing tags

Question

Im trying to parse an the contents of an evernote checklist using beautifulsoup. But when I call the html parser on the contents, it keeps correcting the self-closing tags (en-todo), so when I try to get the text of the en-todo tags, its either blank.

note_body = '<en-todo checked="true" />window caulk<en-todo />cake pan<en-todo />cake mix<en-todo />salad mix<en-todo checked="true"/>painters tape<br />'

import re
from bs4 import BeautifulSoup 
soup = BeautifulSoup(note_body, 'html.parser')
checklist_items = soup.find_all('en-todo')
print checklist_items

The above code returns just the tags, without any of the text.

[<en-todo checked="true"></en-todo>, <en-todo></en-todo>, <en-todo></en-todo>, <en-todo></en-todo>, <en-todo checked="true"></en-todo>]

Keerthana Prabhakaran · Accepted Answer

You need to get the text messages that aren't enclosed in a tag!

You need to use tag.next_sibling!

>>> [each.next_sibling for each in checklist_items]
[u'window caulk', u'cake pan', u'cake mix', u'salad mix', u'painters tape']

Python BeautifulSoup: How get text from self-closing tags

Tags:

python

beautifulsoup

1 Answers

Keerthana Prabhakaran

Recent Activity

Donate For Us

Python BeautifulSoup: How get text from self-closing tags

Tags:

python

beautifulsoup

1 Answers

Keerthana Prabhakaran

Related questions

Recent Activity

Donate For Us