When I use open()
to open a file, I am not able to write unicode strings. I have learned that I need to use codecs
and open the file with Unicode encoding (see http://docs.python.org/howto/unicode.html#reading-and-writing-unicode-data).
Now I need to create some temporary files. I tried to use the tempfile
library, but it doesn't have any encoding option. When I try to write any unicode string in a temporary file with tempfile
, it fails:
#!/usr/bin/python2.6
# -*- coding: utf-8 -*-
import tempfile
with tempfile.TemporaryFile() as fh:
fh.write(u"Hello World: ä")
fh.seek(0)
for line in fh:
print line
How can I create a temporary file with Unicode encoding in Python?
Edit:
I am using Linux and the error message that I get for this code is:
Traceback (most recent call last):
File "tmp_file.py", line 5, in <module>
fh.write(u"Hello World: ä")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 13: ordinal not in range(128)
Creating a Temporary FileThe file is created using the TemporaryFile() function. By default, the file is opened in w+b mode, that is, we can both read and write to the open file. Binary mode is used so that files can work with all types of data. This file may not have a proper visible name in the file system.
TemporaryDirectory() This function creates a temporary directory. You can choose the location of this temporary directory by mentioning dir parameter. Following statement will create a temporary directory in C:\python36 folder.
Everyone else's answers are correct, I just want to clarify what's going on:
The difference between the literal 'foo'
and the literal u'foo'
is that the former is a string of bytes and the latter is the Unicode object.
First, understand that Unicode is the character set. UTF-8 is the encoding. The Unicode object is the about the former—it's a Unicode string, not necessarily a UTF-8 one. In your case, the encoding for a string literal will be UTF-8, because you specified it in the first lines of the file.
To get a Unicode string from a byte string, you call the .encode()
method:
>>>> u"ひらがな".encode("utf-8") == "ひらがな"
True
Similarly, you could call your string.encode in the write
call and achieve the same effect as just removing the u
.
If you didn't specify the encoding in the top, say if you were reading the Unicode data from another file, you would specify what encoding it was in before it reached a Python string. This would determine how it would be represented in bytes (i.e., the str
type).
The error you're getting, then, is only because the tempfile
module is expecting a str
object. This doesn't mean it can't handle unicode, just that it expects you to pass in a byte string rather than a Unicode object—because without you specifying an encoding, it wouldn't know how to write it to the temp file.
tempfile.TemporaryFile has encoding option in Python 3:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import tempfile
with tempfile.TemporaryFile(mode='w+', encoding='utf-8') as fh:
fh.write("Hello World: ä")
fh.seek(0)
for line in fh:
print(line)
Note that now you need to specify mode='w+' instead of the default binary mode. Also note that string literals are implicitly Unicode in Python 3, there's no u modifier.
If you're stuck with Python 2.6, temporary files are always binary, and you need to encode the Unicode string before writing it to the file:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import tempfile
with tempfile.TemporaryFile() as fh:
fh.write(u"Hello World: ä".encode('utf-8'))
fh.seek(0)
for line in fh:
print line.decode('utf-8')
Unicode specifies the character set, not the encoding, so in either case you need a way to specify how to encode the Unicode characters!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With