Hi ' im using Beautifulsoup to parse a website and get a name as output. But after running the script, i get a [u'word1', u'word2', u'word3'] output. What i'm looking for is 'word1 word2 word3'. how do get rid of this u' and make the result a single string?
from bs4 import BeautifulSoup
import urllib2
import re
myfile = open("base/dogs.txt","w+")
myfile.close()
url="http://trackinfo.com/entries-race.jsp?raceid=GBR$20140302A01"
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
names=soup.findAll('a',{'href':re.compile("dog")})
myfile = open("base/dogs.txt","w+")
for eachname in names:
d = (str(eachname.string.split()))+"\n"
print [x.encode('ascii') for x in d]
myfile.write(d)
myfile.close()
BeautifulSoup and Unicode, Dammit!
>>> from bs4 import BeautifulSoup
>>> BeautifulSoup("Sacré bleu!")
<html><body><p>Sacré bleu!</p></body></html>
Isn't that great? When making the soup the document is converted to Unicode, and HTML entities are converted to Unicode characters! So you get Unicode objects as results. Like intended. Nothing wrong with that.
So your question is about Unicode. And Unicode is explained in this video. Don't like video's? Read an Introduction to Unicode.
The u is short for 'The following sting is Unicode encoded'. Instead of 128 ASCII characters you now can use all Unicode characters. More than 110.000 at this moment. The u isn't saved to a file or database. It is visual feedback so you can see that you're dealing with a Unicode encoded string. Use it like it's a normal string, because it is a normal string.
Moral of this story:
u'…'The answers here using .encode() are giving you what you ask for, but probably not what you need. You can keep the unicode encoding and not represent things in a way that shows you what their encoding or type is. Thus, they'll still be [u'word1', u'word2', u'word3'] -- which avoids breaking support for languages that can't be represented in ASCII -- but printed as word1 word2 word3.
Just do:
for eachname in names:
d = ' '.join(eachname.string.split()) + '\n'
print d
myfile.write(d)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With