I don't know exactly what's the source of this error and how to fix it. I am getting it by running this code.
Traceback (most recent call last):
File "t1.py", line 86, in <module>
write_results(results)
File "t1.py", line 34, in write_results
dw.writerows(results)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 154, in writerows
return self.writer.writerows(rows)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
Any explanation is really appreciated!
I changed the code and now I get this error:
File "t1.py", line 88, in <module>
write_results(results)
File "t1.py", line 35, in write_results
dw.writerows(results)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 154, in writerows
return self.writer.writerows(rows)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
Here's the change:
with codecs.open('results.csv', 'wb', 'utf-8') as f:
dw = csv.DictWriter(f, fieldnames=fields, delimiter='|')
dw.writer.writerow(dw.fieldnames)
dw.writerows(results)
The error is raised by this part of the code:
with open('results.csv', 'w') as f:
dw = csv.DictWriter(f, fieldnames=fields, delimiter='|')
dw.writer.writerow(dw.fieldnames)
dw.writerows(results)
You're opening an ASCII file, and then you're trying to write non-ASCII data to it. I guess that whoever wrote that script happened to never encounter a non-ASCII character during testing, so he never ran into an error.
But if you look at the docs for the csv
module, you'll see that the module can't correctly handle Unicode strings (which is what Beautiful Soup returns), that CSV files always have to be opened in binary mode, and that only UTF-8 or ASCII are safe to write.
So you need to encode all the strings to UTF-8 before writing them. I first thought that it should suffice to encode the strings on writing, but the Python 2 csv
module chokes on the Unicode strings anyway. So I guess there's no other way but to encode each string explicitly:
In parse_results()
, change the line
results.append({'url': url, 'create_date': create_date, 'title': title})
to
results.append({'url': url, 'create_date': create_date, 'title': title.encode("utf-8")})
That might already be sufficient since I don't expect URLs or dates to contain non-ASCII characters.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With