Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BeautifulSoup: Get the class text

Assuming that the following code:

for data in soup.findAll('div',{'class':'value'}):
    print(data)

gives the following output:

<div class="value">
<p class="name">Michael Jordan</p>
</div>


<div class="value">
<p class="team">Real Madrid</p>
</div>


<div class="value">
<p class="Sport">Ping Pong</p>
</div>

I want to create the following dictionary:

  Person = {'name': 'Michael Jordan', 'team': 'Real Madrid', 'Sport': 'Ping Pong'}

I can get the text using data.text but how can I get the text of the class in order to name the keys of the dictionary(Person[key1],Person[key2] ...)?

like image 383
Mpizos Dimitris Avatar asked Jan 04 '16 10:01

Mpizos Dimitris


People also ask

How do I extract text from P tags in Beautifulsoup?

Create an HTML document and specify the '<p>' tag into the code. Pass the HTML document into the Beautifulsoup() function. Use the 'P' tag to extract paragraphs from the Beautifulsoup object. Get text from the HTML document with get_text().

What is Find () method in Beautifulsoup?

find() method The find method is used for finding out the first tag with the specified name or id and returning an object of type bs4. Example: For instance, consider this simple HTML webpage having different paragraph tags.


3 Answers

You could use the following:

content = '''
<div class="value">
<p class="name">Michael Jordan</p>
</div>

<div class="value">
<p class="team">Real Madrid</p>
</div>

<div class="value">
<p class="Sport">Ping Pong</p>
</div>
'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(content)

person = {}

for div in soup.findAll('div', {'class': 'value'}):
    person[div.find('p').attrs['class'][0]] = div.text.strip()

print(person)

Output

{'Sport': u'Ping Pong', 'name': u'Michael Jordan', 'team': u'Real Madrid'}
like image 159
gtlambert Avatar answered Nov 10 '22 07:11

gtlambert


You can do iit like this:

for data in soup.findAll('div',{'class':'value'}):
    person = {}
    for item in data.find_all('div'):
        attr = item.p.attrs.get("class")[0]
        value = item.p.text
        person[attr] = value

    print person
like image 34
salmanwahed Avatar answered Nov 10 '22 07:11

salmanwahed


Using this snippet

soup = <div class="value">
        <p class="Sport other-name-class other">Ping Pong</p>
       </div>

p =  soup.find('div.value p')

I found two ways but It is the same, you can use

p.get_attribute_list('class')

or

p.attrs['class']

both return array with all class name, like this ['Sport', 'other-name-class', 'other']

like image 1
Alex Montoya Avatar answered Nov 10 '22 08:11

Alex Montoya