Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get text of children in a div with beautifulsoup

Hi i want the description of an App in the Google Playstore. (https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de)

import urllib2
from bs4 import BeautifulSoup

soup = BeautifulSoup(urllib2.urlopen("https://play.google.com/store/apps/details?id=com.wetter.androidclient&hl=de"))
result = soup.find_all("div", {"class":"show-more-content text-body"})

With this code i get the whole content in this class. But i can't get only the text in it. I tried a lot of things with next_silbing or .text but it always throws errors(ResultSet has no attribute xxx).

I just want to get the text like this: "Die Android App von wetter.com! Sie erhalten: ..:"

Anyone can help me?

like image 388
Si Mon Avatar asked Jan 02 '14 18:01

Si Mon


People also ask

How do you get a child element in BeautifulSoup?

To get all the child nodes of an element in Beautiful Soup, use the find_all() method.

How do you find a specific text tag in BeautifulSoup?

Approach: Here we first import the regular expressions and BeautifulSoup libraries. Then we open the HTML file using the open function which we want to parse. Then using the find_all function, we find a particular tag that we pass inside that function and also the text we want to have within the tag.


1 Answers

Use the .text attribute on the elements; you have a list of results, so loop:

for res in result:
    print(res.text)

.text is a property that proxies for the Element.get_text() method.

Alternatively, if there is only ever supposed to be one such <div>, use .find() instead of .find_all():

result = soup.find("div", {"class":"show-more-content text-body"})
print(result.text)
like image 66
Martijn Pieters Avatar answered Oct 07 '22 01:10

Martijn Pieters