UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026'

Tags:

I'm learning about urllib2 and Beautiful Soup and on first tests am getting errors like:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026' in position 10: ordinal not in range(128)

There seem to be lots of posts about this type of error and I have tried the solutions that I can understand but there seem to be catch 22's with them, e.g.:

I want to print post.text (where text is a beautiful soup method that just returns the text). str(post.text) and post.text produce the unicode errors (on things like right apostrophe's ' and ...).

So I add post = unicode(post) above str(post.text), then I get:

AttributeError: 'unicode' object has no attribute 'text'

I also tried (post.text).encode() and (post.text).renderContents(). The latter producing the error:

AttributeError: 'unicode' object has no attribute 'renderContents'

and then I tried str(post.text).renderContents() and got the error:

AttributeError: 'str' object has no attribute 'renderContents'

It would be great if I could just define somewhere at the top of the document 'make this content 'interpretable'' and still have access to the required text function.

Update: after suggestions:

If I add post = post.decode("utf-8") above str(post.text) I get:

TypeError: unsupported operand type(s) for -: 'str' and 'int'

If I add post = post.decode() above str(post.text) I get:

AttributeError: 'unicode' object has no attribute 'text'

If I add post = post.encode("utf-8") above (post.text) I get:

AttributeError: 'str' object has no attribute 'text'

I tried print post.text.encode('utf-8') and got:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 39: ordinal not in range(128)

And for the sake of trying things that might work, I installed lxml for Windows from here and implemented it with:

parsed_content = BeautifulSoup(original_content, "lxml")

according to http://www.crummy.com/software/BeautifulSoup/bs4/doc/#output-formatters.

These steps didn't seem to make a difference.

I'm using Python 2.7.4 and Beautiful Soup 4.

Solution:

After getting a deeper understanding of unicode, utf-8 and Beautiful Soup types, it had something to do with my printing methodology. I removed all my str methods and concatenations, e.g. str(something) + post.text + str(something_else), so that it was something, post.text, something_else and it seems to be printing well except I have less control of the formatting at this stage (e.g. spaces inserted at ,).

441

asked Apr 27 '13 11:04

user1063287

2 Answers

In Python 2, unicode objects can only be printed if they can be converted to ASCII. If it can't be encoded in ASCII, you'll get that error. You probably want to explicitly encode it and then print the resulting str:

print post.text.encode('utf-8')

147

answered Oct 06 '22 14:10

icktoofay

    html = urllib.request.urlopen(THE_URL).read()
    soup = BeautifulSoup(html)
    print("'" + str(soup.encode("ascii")) + "'")

worked for me ;-)

answered Oct 06 '22 16:10

Patpog

Related questions
                            
                                Date object with year and month only
                            
                                How to access the first and the last elements in a dictionary?
                            
                                Animating "growing" line plot in Python/Matplotlib
                            
                                How to convert pandas single column data frame to series or numpy vector [duplicate]
                            
                                Schrödinger's variable: the __class__ cell magically appears if you're checking for its presence?
                            
                                numpy array concatenate: "ValueError: all the input arrays must have same number of dimensions"
                            
                                How to pass and parse a list of strings from command line with argparse.ArgumentParser in Python?
                            
                                Adding a new column in pandas dataframe from another dataframe with differing indices
                            
                                R's which() and which.min() Equivalent in Python
                            
                                Python: Why is comparison between lists and tuples not supported?
                            
                                Get formula from Excel cell with python xlrd
                            
                                How to do weighted random sample of categories in python
                            
                                Can I get the local variables of a Python function from which an exception was thrown?
                            
                                Truncating a string in python
                            
                                Using Argparse and Json together
                            
                                Can't use /= on numpy array
                            
                                Pyspark: explode json in column to multiple columns
                            
                                Is it possible to get the path of a tempfile in Python 3
                            
                                Python C extension: Use extension PYD or DLL?
                            
                                python time offset

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2026'

Tags:

python

unicode

beautifulsoup

urllib2

python-2.7

user1063287

People also ask

2 Answers

icktoofay

Patpog

Recent Activity

Donate For Us