I'm writing an analyzing tool that counts how many children has any HTML tag in the source code.
I mapped the code with BeautifulSoup, and now I want to iterate over any tag in the page and count how many children it has.
What will be the best way to iterate over all the tags? How can I for example get all the tags that do not have any children?
If you use find_all() with no arguments you can iterate over every tag.
You can get how many children a tag has by using len(tag.contents).
To get a list of all tags with no children:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('someHTMLFile.html', 'r'), 'html.parser')
body = soup.body
empty_tags = []
for tag in body.find_all():
   if len(tag.contents) == 0:
      empty_tags.append(tag)
print empty_tags
or...
empty_tags = [tag for tag in soup.body.find_all() if len(tag.contents) == 0]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With