How to convert BeautifulSoup.ResultSet to string

Question

So I parsed a html page with .findAll (BeautifulSoup) to variable named result. If I type result in Python shell then press Enter, I see normal text as expected, but as I wanted to postprocess this result as string object, I noticed that str(result) returns garbage, like this sample:

\xd1\x87\xd0\xb8\xd0\xbb\xd0\xbd\xd0\xb8\xd1\x86\xd0\xb0</a><br />
<hr />
</div>

Html page source is utf-8 encoded

How can I handle this?

Code is basically this, in case it matters:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(urllib.open(url).read())
result = soup.findAll(something)

Python is 2.7

Johnny Brown · Accepted Answer

Python 2.6.7 BeautifulSoup.version 3.2.0

This worked for me:

unicode.join(u'
',map(unicode,result))

I'm pretty sure a result is a BeautifulSoup.ResultSet object, which seems to be an extension of the standard python list

ChangePicture · Answer

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(urllib.open(url).read())
#findAll should get multiple parsed result
result = soup.findAll(something)
#then iterate result
for line in result:
    #get str value from each line,replace charset with utf-8 or other charset you need
    print line.__str__('charset')

BTW:BeautifulSoup's version is beautifulsoup-3.2.1

How to convert BeautifulSoup.ResultSet to string

Tags:

python

unicode

beautifulsoup

theta

2 Answers

Johnny Brown

ChangePicture

Recent Activity

Donate For Us

How to convert BeautifulSoup.ResultSet to string

Tags:

python

unicode

beautifulsoup

theta

2 Answers

Johnny Brown

ChangePicture

Related questions

Recent Activity

Donate For Us