Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get tag's children count with BeautifulSoup

I'm writing an analyzing tool that counts how many children has any HTML tag in the source code.

I mapped the code with BeautifulSoup, and now I want to iterate over any tag in the page and count how many children it has.

What will be the best way to iterate over all the tags? How can I for example get all the tags that do not have any children?

like image 603
Dan Avatar asked Oct 26 '25 15:10

Dan


1 Answers

If you use find_all() with no arguments you can iterate over every tag.

You can get how many children a tag has by using len(tag.contents).

To get a list of all tags with no children:

from bs4 import BeautifulSoup

soup = BeautifulSoup(open('someHTMLFile.html', 'r'), 'html.parser')
body = soup.body

empty_tags = []

for tag in body.find_all():
   if len(tag.contents) == 0:
      empty_tags.append(tag)

print empty_tags

or...

empty_tags = [tag for tag in soup.body.find_all() if len(tag.contents) == 0]
like image 75
Ullauri Avatar answered Oct 29 '25 18:10

Ullauri