Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python, beautiful soup, get all class name

given an html code lets say:

 <div class="class1">
    <span class="class2">some text</span>
    <span class="class3">some text</span>
    <span class="class4">some text</span>
    </div>

How can I retrieve all the class names? ie: ['class1','class2','class3','class4']

I tried:

soup.find_all(class_=True)

But it retrieves the whole tag and i then need to do some regex on the string

like image 550
woshitom Avatar asked May 03 '17 05:05

woshitom


1 Answers

You can treat each Tag instance found as a dictionary when it comes to retrieving attributes. Note that class attribute value would be a list since class is a special "multi-valued" attribute:

classes = []
for element in soup.find_all(class_=True):
    classes.extend(element["class"])

Or:

classes = [value
           for element in soup.find_all(class_=True)
           for value in element["class"]]

Demo:

from bs4 import BeautifulSoup

data = """
<div class="class1">
<span class="class2">some text</span>
<span class="class3">some text</span>
<span class="class4">some text</span>
</div>
"""

soup = BeautifulSoup(data, "html.parser")

classes = [value
           for element in soup.find_all(class_=True)
           for value in element["class"]]

print(classes)

# Returns
# ['class1', 'class2', 'class3', 'class4']
like image 161
alecxe Avatar answered Oct 22 '22 04:10

alecxe