Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 2.7: Setting I/O Encoding, ’?

Attempting to write a line to a text file in Python 2.7, and have the following code:

# -*- coding: utf-8 -*-
...
f = open(os.path.join(os.path.dirname(__file__), 'output.txt'), 'w')
f.write('Smith’s BaseBall Cap') // Note the strangely shaped apostrophe

However, in output.txt, I get Smith’s BaseBall Cap, instead. Not sure how to correct this encoding problem? Any protips with this sort of issue?

like image 651
zhuyxn Avatar asked Dec 08 '22 22:12

zhuyxn


1 Answers

You have declared your file to be encoded with UTF-8, so your byte-string literal is in UTF-8. The curly apostrophe is U+2019. In UTF-8, this is encoded as three bytes, \xE2\x80\x99. Those three bytes are written to your output file. Then, when you examine the output file, it is interpreted as something other than UTF-8, and you see the three incorrect characters instead.

In Mac OS Roman, those three bytes display as ’.

Your file is a correct UTF-8 file, but you are viewing it incorrectly.

like image 124
Ned Batchelder Avatar answered Dec 28 '22 02:12

Ned Batchelder