Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BeautifulSoup: get contents[] as a single string

Tags:

Anyone know an elegant way to get the entire contents of a soup object as a single string?

At the moment I'm getting contents, which is of course a list, and then iterating over it:

notices = soup.find("div", {"class" : "middlecontent"}) con = "" for content in notices.contents:     con += str(content) print con 

Thanks!

like image 277
AP257 Avatar asked Dec 20 '10 10:12

AP257


People also ask

How do you get string in Beautifulsoup?

To convert a Tag object to a string in Beautiful Soup, simply use str(Tag) .

How do you extract text from a tag in Beautifulsoup?

To extract text that is directly under an element in Beautiful Soup use the find_all(text=True, recursive=False) method.


2 Answers

What about contents = str(notices) ?

Or maybe contents = notices.renderContents(), which will hide the div tag.

like image 139
Fábio Diniz Avatar answered Sep 21 '22 19:09

Fábio Diniz


You can use the join() method:

notices = soup.find("div", {"class": "middlecontent"}) contents = "".join([str(item) for item in notices.contents]) 

Or, using a generator expression:

contents = "".join(str(item) for item in notices.contents) 
like image 41
Frédéric Hamidi Avatar answered Sep 22 '22 19:09

Frédéric Hamidi