Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I create a file with utf-8 in Python?

I use open('test.txt', 'w') to create a new file and its character set is binary.

>>> open('test.txt', 'w')
<open file 'test.txt', mode 'w' at 0x7f6b973704b0>

$ file -i test.txt.txt 
test2.txt: inode/x-empty; charset=binary

Assign a file with the specified character set (say utf-8) using the module codecs. However, the charset is still binary.

>>> codecs.open("test.txt", 'w', encoding='utf-8')
<open file 'test.txt', mode 'wb' at 0x7f6b97370540>

$ file -i test.txt 
test.txt: inode/x-empty; charset=binary

I write something to test.txt and the charset is us-ascii.

>>> fp. write ("wwwwwwwwwww")
>>> fp.close()

$ file -i test.txt 
test.txt: text/plain; charset=us-ascii

OK, now, I write some special characters (say Arènes). However,

>>> fp = codecs.open("test.txt", 'w', encoding='utf-8')
>>> fp.write("Arènes")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/codecs.py", line 688, in write
    return self.writer.write(data)
  File "/usr/lib/python2.7/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128)

To be more specific, I would like to save the query result (using python-mysqldb) into a file. The key source codes are as follows:

cur.execute("SELECT * FROM agency")

# Write to a file
with open('test.txt', 'w') as fp :
    for row in cur.fetchall() :
        s = '\t'.join(str(item) for item in row)
        fp.write(s + '\n')

Now, the charset of test.txt is iso-8859-1 (some French characters, such as Arènes).

Thus, I use codecs.open('test.txt', 'w', encoding='utf-8') to create a file. However, encounter the following error:

Traceback (most recent call last):
  File "./overlap_intervals.py", line 26, in <module>
    fp.write(s + '\n')
  File "/usr/lib/python2.7/codecs.py", line 688, in write
    return self.writer.write(data)
  File "/usr/lib/python2.7/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 21: ordinal not in range(128)

How can I create a file with utf-8 in Python?

like image 840
SparkAndShine Avatar asked Jun 28 '26 08:06

SparkAndShine


1 Answers

An empty file is always binary.

$ touch /tmp/foo
$ file -i /tmp/foo 
/tmp/foo: inode/x-empty; charset=binary

Put something in it and everything is fine.

$ cat > /tmp/foo 
Rübe
Möhre
Mähne
$ file -i /tmp/foo
/tmp/foo: text/plain; charset=utf-8

Python will do the same as cat.

with open("/tmp/foo", "w") as f:
    f.write("Rübe\n")

Check it:

$ cat /tmp/foo
Rübe
$ file -i /tmp/foo
/tmp/foo: text/plain; charset=utf-8

Edit:

Using Python 2.7, you must encode an Unicode string.

with open("/tmp/foo", "w") as f:
    f.write(u"Rübe\n".encode("UTF-8"))