I am trying to get a list of all html tags from beautiful soup.
I see find all but I have to know the name of the tag before I search.
If there is text like
html = """<div>something</div>
<div>something else</div>
<div class='magical'>hi there</div>
<p>ok</p>"""
How would I get a list like
list_of_tags = ["<div>", "<div>", "<div class='magical'>", "<p>"]
I know how to do this with regex, but am trying to learn BS4
To get all the HTML tags of a web page using the BeautifulSoup library first import BeautifulSoup and requests library to make a GET request to the web page. Step-by-step Approach: Import required modules.
The HTML content of the webpages can be parsed and scraped with Beautiful Soup.
Step-by-step Approach. Step 1: The first step will be for scraping we need to import beautifulsoup module and get the request of the website we need to import the requests module. Step 2: The second step will be to request the URL call get method.
You don't have to specify any arguments to find_all()
- in this case, BeautifulSoup
would find you every tag in the tree, recursively.
Sample:
from bs4 import BeautifulSoup
html = """<div>something</div>
<div>something else</div>
<div class='magical'>hi there</div>
<p>ok</p>
"""
soup = BeautifulSoup(html, "html.parser")
print([tag.name for tag in soup.find_all()])
# ['div', 'div', 'div', 'p']
print([str(tag) for tag in soup.find_all()])
# ['<div>something</div>', '<div>something else</div>', '<div class="magical">hi there</div>', '<p>ok</p>']
Please try the below--
for tag in soup.findAll(True):
print(tag.name)
I thought I'd share my solution to a very similar question for those that find themselves here, later.
I needed to find all tags quickly but only wanted unique values. I'll use the Python calendar
module to demonstrate.
We'll generate an html calendar then parse it, finding all and only those unique tags present.
The below structure is very similar to the above, using set comprehensions:
from bs4 import BeautifulSoup
import calendar
html_cal = calendar.HTMLCalendar().formatmonth(2020, 1)
set(tag.name for tag in BeautifulSoup(html_cal, 'html.parser').find_all())
# Result
# {'table', 'td', 'th', 'tr'}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With