Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pull Tag Value using BeautifulSoup

Can someone direct me as how to pull the value of a tag using BeautifulSoup? I read the documentation but had a hard time navigating through it. For example, if I had:

<span title="Funstuff" class="thisClass">Fun Text</span>

How would I just pull "Funstuff" busing BeautifulSoup/Python?

Edit: I am using version 3.2.1

like image 400
user1463925 Avatar asked Jul 23 '12 18:07

user1463925


People also ask

What is NavigableString Python?

A NavigableString object holds the text within an HTML or an XML tag. This is a Python Unicode string with methods for searching and navigation. Sometimes we may need to navigate to other tags or text within an HTML/XML document based on the current text.


2 Answers

You need to have something to identify the element you're looking for, and it's hard to tell what it is in this question.

For example, both of these will print out 'Funstuff' in BeautifulSoup 3. One looks for a span element and gets the title, another looks for spans with the given class. Many other valid ways to get to this point are possible.

import BeautifulSoup
soup = BeautifulSoup.BeautifulSoup('<html><body><span title="Funstuff" class="thisClass">Fun Text</span></body></html>')
print soup.html.body.span['title']
print soup.find('span', {"class": "thisClass"})['title']
like image 54
Steven Huwig Avatar answered Oct 06 '22 00:10

Steven Huwig


A tags children are available via .contents http://www.crummy.com/software/BeautifulSoup/bs4/doc/#contents-and-children In your case you can find the tag be using its CSS class to extract the contents

from bs4 import BeautifulSoup
soup=BeautifulSoup('<span title="Funstuff" class="thisClass">Fun Text</span>')
soup.select('.thisClass')[0].contents[0]

http://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors has all the details nevessary

like image 21
Manoj I Avatar answered Oct 05 '22 23:10

Manoj I