Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)

I am trying to pull a list of 500 restaurants in Amsterdam from TripAdvisor; however after the 308th restaurant I get the following error:

Traceback (most recent call last):
  File "C:/Users/dtrinh/PycharmProjects/TripAdvisorData/LinkPull-HK.py", line 43, in <module>
    writer.writerow(rest_array)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)

I tried several things I found on StackOverflow, but nothing is working as of right now. I was wondering if someone could take a look at my code and see any potential solutions that would be great.

        for item in soup2.findAll('div', attrs={'class', 'title'}):
            if 'Cuisine' in item.text:
                item.text.strip()
                content = item.findNext('div', attrs=('class', 'content'))
                cuisine_type = content.text.encode('utf8', 'ignore').strip().split(r'\xa0')
        rest_array = [account_name, rest_address, postcode, phonenumber, cuisine_type]
        #print rest_array
        with open('ListingsPull-Amsterdam.csv', 'a') as file:
                writer = csv.writer(file)
                writer.writerow(rest_array)
    break
like image 573
dtrinh Avatar asked Nov 15 '16 21:11

dtrinh


2 Answers

The rest_array contains unicode strings. When you use csv.writer to write rows, you need to serialise bytes strings (you are on Python 2.7).

I suggest you to use "utf8" encoding:

with open('ListingsPull-Amsterdam.csv', mode='a') as fd:
    writer = csv.writer(fd)
    rest_array = [text.encode("utf8") for text in rest_array]
    writer.writerow(rest_array)

note: please, don't use file as variable because you shadow the built-in function file() (an alias of open() function).

If you want to open this CSV file with Microsoft Excel, you may consider using another encoding, for instance "cp1252" (it allows u"\u2019" character).

like image 98
Laurent LAPORTE Avatar answered Nov 17 '22 10:11

Laurent LAPORTE


You're writing a non-ascii character(s) to your csv output file. Make sure you open the output file with the appropriate character encoding that allows for the character(s) to be encoded. A safe bet is often UTF-8. Try this:

with open('ListingsPull-Amsterdam.csv', 'a', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(rest_array)

edit this is for Python 3.x, sorry.

like image 4
Irmen de Jong Avatar answered Nov 17 '22 10:11

Irmen de Jong