Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

Tags:

python

unicode

I don't know exactly what's the source of this error and how to fix it. I am getting it by running this code.

 Traceback (most recent call last):
      File "t1.py", line 86, in <module>
        write_results(results)
      File "t1.py", line 34, in write_results
        dw.writerows(results)
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 154, in writerows
        return self.writer.writerows(rows)
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

Any explanation is really appreciated!

I changed the code and now I get this error:

 File "t1.py", line 88, in <module>
    write_results(results)
  File "t1.py", line 35, in write_results
    dw.writerows(results)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 154, in writerows
    return self.writer.writerows(rows)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

Here's the change:

 with codecs.open('results.csv', 'wb', 'utf-8') as f:
        dw = csv.DictWriter(f, fieldnames=fields, delimiter='|')
        dw.writer.writerow(dw.fieldnames)
        dw.writerows(results)
like image 879
Mona Jalal Avatar asked Sep 30 '22 22:09

Mona Jalal


1 Answers

The error is raised by this part of the code:

with open('results.csv', 'w') as f:
    dw = csv.DictWriter(f, fieldnames=fields, delimiter='|')
    dw.writer.writerow(dw.fieldnames)
    dw.writerows(results)

You're opening an ASCII file, and then you're trying to write non-ASCII data to it. I guess that whoever wrote that script happened to never encounter a non-ASCII character during testing, so he never ran into an error.

But if you look at the docs for the csv module, you'll see that the module can't correctly handle Unicode strings (which is what Beautiful Soup returns), that CSV files always have to be opened in binary mode, and that only UTF-8 or ASCII are safe to write.

So you need to encode all the strings to UTF-8 before writing them. I first thought that it should suffice to encode the strings on writing, but the Python 2 csv module chokes on the Unicode strings anyway. So I guess there's no other way but to encode each string explicitly:

In parse_results(), change the line

results.append({'url': url, 'create_date': create_date, 'title': title})

to

results.append({'url': url, 'create_date': create_date, 'title': title.encode("utf-8")})

That might already be sufficient since I don't expect URLs or dates to contain non-ASCII characters.

like image 155
Tim Pietzcker Avatar answered Oct 12 '22 22:10

Tim Pietzcker