Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract content within a tag with BeautifulSoup

I'd like to extract the content Hello world. Please note that there are multiples <table> and similar <td colspan="2"> on the page as well:

<table border="0" cellspacing="2" width="800">   <tr>     <td colspan="2"><b>Name: </b>Hello world</td>   </tr>   <tr> ... 

I tried the following:

hello = soup.find(text='Name: ') hello.findPreviousSiblings 

But it returned nothing.

In addition, I'm also having problem with the following extracting the My home address:

<td><b>Address:</b></td>  <td>My home address</td> 

I'm also using the same method to search for the text="Address: " but how do I navigate down to the next line and extract the content of <td>?

like image 823
ready Avatar asked May 14 '11 02:05

ready


People also ask

How do you get content from BeautifulSoup?

Sending an HTTP GET request to the URL of the webpage that you want to scrape, which will respond with HTML content. We can do this by using the Request library of Python. Fetching and parsing the data using Beautifulsoup and maintain the data in some data structure such as Dict or List.


2 Answers

The contents operator works well for extracting text from <tag>text</tag> .


<td>My home address</td> example:

s = '<td>My home address</td>' soup =  BeautifulSoup(s) td = soup.find('td') #<td>My home address</td> td.contents #My home address 

<td><b>Address:</b></td> example:

s = '<td><b>Address:</b></td>' soup =  BeautifulSoup(s) td = soup.find('td').find('b') #<b>Address:</b> td.contents #Address: 
like image 62
solvingPuzzles Avatar answered Oct 13 '22 22:10

solvingPuzzles


use next instead

>>> s = '<table border="0" cellspacing="2" width="800"><tr><td colspan="2"><b>Name: </b>Hello world</td></tr><tr>' >>> soup = BeautifulSoup(s) >>> hello = soup.find(text='Name: ') >>> hello.next u'Hello world' 

next and previous let you move through the document elements in the order they were processed by the parser while sibling methods work with the parse tree

like image 42
AnalyticsBuilder Avatar answered Oct 13 '22 21:10

AnalyticsBuilder