I'm attempting to get a list of div ids from a page. When I print out the attributes, I get the ids listed.
for tag in soup.find_all(class_="bookmark blurb group") : print(tag.attrs)
results in:
{'id': 'bookmark_8199633', 'role': 'article', 'class': ['bookmark', 'blurb', 'group']} {'id': 'bookmark_7744613', 'role': 'article', 'class': ['bookmark', 'blurb', 'group']} {'id': 'bookmark_7338591', 'role': 'article', 'class': ['bookmark', 'blurb', 'group']} {'id': 'bookmark_7338535', 'role': 'article', 'class': ['bookmark', 'blurb', 'group']} {'id': 'bookmark_4530078', 'role': 'article', 'class': ['bookmark', 'blurb', 'group']}
So I know there ARE ids. However, when I print out tag.id instead, I just get a list of "None". What am I doing wrong here?
Going down. One of the important pieces of element in any piece of HTML document are tags, which may contain other tags/strings (tag's children). Beautiful Soup provides different ways to navigate and iterate over's tag's children.
Approach: Here we first import the regular expressions and BeautifulSoup libraries. Then we open the HTML file using the open function which we want to parse. Then using the find_all function, we find a particular tag that we pass inside that function and also the text we want to have within the tag.
You can access tag’s attributes by treating the tag like a dictionary (documentation):
for tag in soup.find_all(class_="bookmark blurb group") : print tag.get('id')
The reason tag.id
didn't work is that it is equivalent to tag.find('id')
, which results into None
since there is no id
tag found (documentation).
This solution lists all tags with ids in a page , It might be helpful too.
tags = page_soup.find_all() for tag in tags: if 'id' in tag.attrs: print(tag.name,tag['id'],sep='->')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With