<p>Using Beautiful Soup module, how can I get data of a <code>div</code> tag whose class name is <code>feeditemcontent cxfeeditemcontent</code>? Is it:</p> <pre class="prettyprint"><code>soup.class['feeditemcontent cxfeeditemcontent'] </code></pre> <p>or:</p> <pre class="prettyprint"><code>soup.find_all('class') </code></pre> <p>This is the HTML source:</p> <pre class="prettyprint"><code><div class="feeditemcontent cxfeeditemcontent"> <div class="feeditembodyandfooter"> <div class="feeditembody"> <span>The actual data is some where here</span> </div> </div> </div> </code></pre> <p>and this is the Python code:</p> <pre class="prettyprint"><code> from BeautifulSoup import BeautifulSoup html_doc = open('home.jsp.html', 'r') soup = BeautifulSoup(html_doc) class="feeditemcontent cxfeeditemcontent" </code></pre>

<p>Try this, maybe it's too much for this simple thing but it works:</p> <pre class="prettyprint"><code>def match_class(target): target = target.split() def do_match(tag): try: classes = dict(tag.attrs)["class"] except KeyError: classes = "" classes = classes.split() return all(c in classes for c in target) return do_match html = """<div class="feeditemcontent cxfeeditemcontent"> <div class="feeditembodyandfooter"> <div class="feeditembody"> <span>The actual data is some where here</span> </div> </div> </div>""" from BeautifulSoup import BeautifulSoup soup = BeautifulSoup(html) matches = soup.findAll(match_class("feeditemcontent cxfeeditemcontent")) for m in matches: print m print "-"*10 matches = soup.findAll(match_class("feeditembody")) for m in matches: print m print "-"*10 </code></pre>

<p><code>soup.findAll("div", class_="feeditemcontent cxfeeditemcontent")</code></p> <p>So, If I want to get all div tags of class header <code><div class="header"></code> from stackoverflow.com, an example with BeautifulSoup would be something like:</p> <pre class="prettyprint"><code>from bs4 import BeautifulSoup as bs import requests url = "http://stackoverflow.com/" html = requests.get(url).text soup = bs(html) tags = soup.findAll("div", class_="header") </code></pre> <p>It is already in bs4 documentation.</p>

Get contents by class names using Beautiful Soup

Tags:

python

beautifulsoup

Using Beautiful Soup module, how can I get data of a div tag whose class name is feeditemcontent cxfeeditemcontent? Is it:

soup.class['feeditemcontent cxfeeditemcontent']

or:

soup.find_all('class')

This is the HTML source:

<div class="feeditemcontent cxfeeditemcontent">
    <div class="feeditembodyandfooter">
         <div class="feeditembody">
         <span>The actual data is some where here</span>
         </div>
     </div>
 </div>

and this is the Python code:

 from BeautifulSoup import BeautifulSoup
 html_doc = open('home.jsp.html', 'r')

 soup = BeautifulSoup(html_doc)
 class="feeditemcontent cxfeeditemcontent"

826

asked Jul 04 '12 14:07

Rajeev

3 Answers

Beautiful Soup 4 treats the value of the "class" attribute as a list rather than a string, meaning jadkik94's solution can be simplified:

from bs4 import BeautifulSoup                                                   

def match_class(target):                                                        
    def do_match(tag):                                                          
        classes = tag.get('class', [])                                          
        return all(c in classes for c in target)                                
    return do_match                                                             

soup = BeautifulSoup(html)                                                      
print soup.find_all(match_class(["feeditemcontent", "cxfeeditemcontent"]))

180

answered Sep 21 '22 15:09

Leonard Richardson

Try this, maybe it's too much for this simple thing but it works:

def match_class(target):
    target = target.split()
    def do_match(tag):
        try:
            classes = dict(tag.attrs)["class"]
        except KeyError:
            classes = ""
        classes = classes.split()
        return all(c in classes for c in target)
    return do_match

html = """<div class="feeditemcontent cxfeeditemcontent">
<div class="feeditembodyandfooter">
<div class="feeditembody">
<span>The actual data is some where here</span>
</div>
</div>
</div>"""

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(html)

matches = soup.findAll(match_class("feeditemcontent cxfeeditemcontent"))
for m in matches:
    print m
    print "-"*10

matches = soup.findAll(match_class("feeditembody"))
for m in matches:
    print m
    print "-"*10

answered Sep 17 '22 15:09

jadkik94

soup.findAll("div", class_="feeditemcontent cxfeeditemcontent")

So, If I want to get all div tags of class header <div class="header"> from stackoverflow.com, an example with BeautifulSoup would be something like:

from bs4 import BeautifulSoup as bs
import requests 

url = "http://stackoverflow.com/"
html = requests.get(url).text
soup = bs(html)

tags = soup.findAll("div", class_="header")

It is already in bs4 documentation.

answered Sep 19 '22 15:09

Aziz Alto

Related questions
                            
                                Failing to install psycopg2-binary on new docker container
                            
                                Removing the Label From Django's TextArea Widget
                            
                                Python IDE on Linux Console
                            
                                SQLAlchemy JSON as blob/text
                            
                                Remove duplicate chars using regex?
                            
                                Accessing form fields as properties in a django view
                            
                                python: how to convert a query string to json string?
                            
                                Django: How can I create a multiple select form?
                            
                                Python writing binary
                            
                                Axis limits for scatter plot - Matplotlib
                            
                                How to invert black and white with scikit-image?
                            
                                Importing bs4 in Python 3.5
                            
                                Python, How to Send data over TCP
                            
                                Visualize MNIST dataset using OpenCV or Matplotlib/Pyplot
                            
                                assertTrue() in pytest to assert empty lists
                            
                                Exception: "dot" not found in path in python on mac
                            
                                Install issues with 'lr_utils' in python
                            
                                Directory Listing based on time [duplicate]
                            
                                Python: Anyway to use map to get first element of a tuple
                            
                                Warning: The Command Line Tools for Xcode don't appear to be installed; most ports will likely fail to build [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With