<p>I am trying to get a list of all html tags from beautiful soup.</p> <p>I see find all but I have to know the name of the tag before I search.</p> <p>If there is text like </p> <pre class="prettyprint"><code>html = """<div>something</div> <div>something else</div> <div class='magical'>hi there</div> <p>ok</p>""" </code></pre> <p>How would I get a list like </p> <pre class="prettyprint"><code>list_of_tags = ["<div>", "<div>", "<div class='magical'>", "<p>"] </code></pre> <p>I know how to do this with regex, but am trying to learn BS4</p>

<p>You don't have to specify any arguments to <code>find_all()</code> - in this case, <code>BeautifulSoup</code> would find you every tag in the tree, recursively.</p> <p>Sample:</p> <pre class="prettyprint lang-py prettyprint-override"><code>from bs4 import BeautifulSoup html = """<div>something</div> <div>something else</div> <div class='magical'>hi there</div> <p>ok</p> """ soup = BeautifulSoup(html, "html.parser") print([tag.name for tag in soup.find_all()]) # ['div', 'div', 'div', 'p'] print([str(tag) for tag in soup.find_all()]) # ['<div>something</div>', '<div>something else</div>', '<div class="magical">hi there</div>', '<p>ok</p>'] </code></pre>

<p>Please try the below--</p> <pre class="prettyprint"><code>for tag in soup.findAll(True): print(tag.name) </code></pre>

<p>I thought I'd share my solution to a very similar question for those that find themselves here, later.</p> <h3>Example</h3> <p>I needed to find all tags quickly but only wanted unique values. I'll use the Python <code>calendar</code> module to demonstrate.</p> <p>We'll generate an html calendar then parse it, finding all and only those unique tags present.</p> <p>The below structure is <em>very</em> similar to the above, using set comprehensions:</p> <pre class="prettyprint lang-py prettyprint-override"><code>from bs4 import BeautifulSoup import calendar html_cal = calendar.HTMLCalendar().formatmonth(2020, 1) set(tag.name for tag in BeautifulSoup(html_cal, 'html.parser').find_all()) # Result # {'table', 'td', 'th', 'tr'} </code></pre>

Get all HTML tags with Beautiful Soup

Tags:

python

html

beautifulsoup

I am trying to get a list of all html tags from beautiful soup.

I see find all but I have to know the name of the tag before I search.

If there is text like

html = """<div>something</div>
<div>something else</div>
<div class='magical'>hi there</div>
<p>ok</p>"""

How would I get a list like

list_of_tags = ["<div>", "<div>", "<div class='magical'>", "<p>"]

I know how to do this with regex, but am trying to learn BS4

858

asked Mar 19 '16 23:03

humanbeing

3 Answers

You don't have to specify any arguments to find_all() - in this case, BeautifulSoup would find you every tag in the tree, recursively.

Sample:

from bs4 import BeautifulSoup

html = """<div>something</div>
<div>something else</div>
<div class='magical'>hi there</div>
<p>ok</p>
"""
soup = BeautifulSoup(html, "html.parser")

print([tag.name for tag in soup.find_all()])
# ['div', 'div', 'div', 'p']

print([str(tag) for tag in soup.find_all()])
# ['<div>something</div>', '<div>something else</div>', '<div class="magical">hi there</div>', '<p>ok</p>']

113

answered Oct 18 '22 20:10

alecxe

Please try the below--

for tag in soup.findAll(True):
    print(tag.name)

answered Oct 18 '22 19:10

Anjan

I thought I'd share my solution to a very similar question for those that find themselves here, later.

Example

I needed to find all tags quickly but only wanted unique values. I'll use the Python calendar module to demonstrate.

We'll generate an html calendar then parse it, finding all and only those unique tags present.

The below structure is very similar to the above, using set comprehensions:

from bs4 import BeautifulSoup
import calendar

html_cal = calendar.HTMLCalendar().formatmonth(2020, 1)
set(tag.name for tag in BeautifulSoup(html_cal, 'html.parser').find_all())

# Result
# {'table', 'td', 'th', 'tr'}

answered Oct 18 '22 18:10

Jason R Stevens CFA

Related questions
                            
                                Unicode vs UTF-8 confusion in Python / Django?
                            
                                cursor.rowcount always -1 in sqlite3 in python3k
                            
                                for line in open(filename)
                            
                                Python Queue get()/task_done() issue
                            
                                Why won't re.groups() give me anything for my one correctly-matched group?
                            
                                Bytes in a unicode Python string
                            
                                Does sqlite3 compress data?
                            
                                python append to array in json object
                            
                                How can I get Bottle to restart on file change?
                            
                                Exposing `defaultdict` as a regular `dict`
                            
                                Python error when trying to access list by index - "List indices must be integers, not str"
                            
                                How to read line-delimited JSON from large file (line by line)
                            
                                numpy.asarray: how to check up that its result dtype is numeric?
                            
                                Check if Flask request context is available
                            
                                How to generate urls in django
                            
                                Why does adding a trailing comma after a variable name make it a tuple?
                            
                                Resizing and stretching a NumPy array
                            
                                Calling app from subprocess.call with arguments
                            
                                How to profile cython functions line-by-line
                            
                                How to import OpenSSL in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With