Remove 'u from a webscrape output

Question

Hi ' im using Beautifulsoup to parse a website and get a name as output. But after running the script, i get a [u'word1', u'word2', u'word3'] output. What i'm looking for is 'word1 word2 word3'. how do get rid of this u' and make the result a single string?

from bs4 import BeautifulSoup
import urllib2
import re

myfile = open("base/dogs.txt","w+")
myfile.close()

url="http://trackinfo.com/entries-race.jsp?raceid=GBR$20140302A01"
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
names=soup.findAll('a',{'href':re.compile("dog")})
myfile = open("base/dogs.txt","w+")
for eachname in names:
    d = (str(eachname.string.split()))+"
"
    print [x.encode('ascii') for x in d]
    myfile.write(d)

myfile.close()

allcaps · Accepted Answer

BeautifulSoup and Unicode, Dammit!

>>> from bs4 import BeautifulSoup
>>> BeautifulSoup("Sacr&eacute; bleu!")
<html><body><p>Sacré bleu!</p></body></html>

Isn't that great? When making the soup the document is converted to Unicode, and HTML entities are converted to Unicode characters! So you get Unicode objects as results. Like intended. Nothing wrong with that.

So your question is about Unicode. And Unicode is explained in this video. Don't like video's? Read an Introduction to Unicode.

The u is short for 'The following sting is Unicode encoded'. Instead of 128 ASCII characters you now can use all Unicode characters. More than 110.000 at this moment. The u isn't saved to a file or database. It is visual feedback so you can see that you're dealing with a Unicode encoded string. Use it like it's a normal string, because it is a normal string.

Moral of this story:

☺ when you see a `u'…'`

Charles Duffy · Answer

The answers here using .encode() are giving you what you ask for, but probably not what you need. You can keep the unicode encoding and not represent things in a way that shows you what their encoding or type is. Thus, they'll still be [u'word1', u'word2', u'word3'] -- which avoids breaking support for languages that can't be represented in ASCII -- but printed as word1 word2 word3.

Just do:

for eachname in names:
    d = ' '.join(eachname.string.split()) + '\n'
    print d
    myfile.write(d)

Just do:

for eachname in names:
    d = ' '.join(eachname.string.split()) + '
'
    print d
    myfile.write(d)

Remove 'u from a webscrape output

Tags:

python

beautifulsoup

web-scraping

user3319895

2 Answers

☺ when you see a `u'…'`

allcaps

Charles Duffy

Recent Activity

Donate For Us

Remove 'u from a webscrape output

Tags:

python

beautifulsoup

web-scraping

user3319895

2 Answers

☺ when you see a u'…'

allcaps

Charles Duffy

Related questions

Recent Activity

Donate For Us

☺ when you see a `u'…'`