How to create a temporary file with Unicode encoding?

Tags:

When I use open() to open a file, I am not able to write unicode strings. I have learned that I need to use codecs and open the file with Unicode encoding (see http://docs.python.org/howto/unicode.html#reading-and-writing-unicode-data).

Now I need to create some temporary files. I tried to use the tempfile library, but it doesn't have any encoding option. When I try to write any unicode string in a temporary file with tempfile, it fails:

#!/usr/bin/python2.6
# -*- coding: utf-8 -*-
import tempfile
with tempfile.TemporaryFile() as fh:
  fh.write(u"Hello World: ä")
  fh.seek(0)
  for line in fh:
    print line

How can I create a temporary file with Unicode encoding in Python?

Edit:

I am using Linux and the error message that I get for this code is:

Traceback (most recent call last):
  File "tmp_file.py", line 5, in <module>
    fh.write(u"Hello World: ä")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 13: ordinal not in range(128)

This is just an example. In practice I am trying to write a string that some API returned.

397

asked May 08 '12 00:05

dbarbosa

2 Answers

Everyone else's answers are correct, I just want to clarify what's going on:

The difference between the literal 'foo' and the literal u'foo' is that the former is a string of bytes and the latter is the Unicode object.

First, understand that Unicode is the character set. UTF-8 is the encoding. The Unicode object is the about the former—it's a Unicode string, not necessarily a UTF-8 one. In your case, the encoding for a string literal will be UTF-8, because you specified it in the first lines of the file.

To get a Unicode string from a byte string, you call the .encode() method:

>>>> u"ひらがな".encode("utf-8") == "ひらがな"
True

Similarly, you could call your string.encode in the write call and achieve the same effect as just removing the u.

If you didn't specify the encoding in the top, say if you were reading the Unicode data from another file, you would specify what encoding it was in before it reached a Python string. This would determine how it would be represented in bytes (i.e., the str type).

The error you're getting, then, is only because the tempfile module is expecting a str object. This doesn't mean it can't handle unicode, just that it expects you to pass in a byte string rather than a Unicode object—because without you specifying an encoding, it wouldn't know how to write it to the temp file.

187

answered Sep 22 '22 15:09

dfb

tempfile.TemporaryFile has encoding option in Python 3:

#!/usr/bin/python3
# -*- coding: utf-8 -*-
import tempfile
with tempfile.TemporaryFile(mode='w+', encoding='utf-8') as fh:
  fh.write("Hello World: ä")
  fh.seek(0)
  for line in fh:
    print(line)

Note that now you need to specify mode='w+' instead of the default binary mode. Also note that string literals are implicitly Unicode in Python 3, there's no u modifier.

If you're stuck with Python 2.6, temporary files are always binary, and you need to encode the Unicode string before writing it to the file:

#!/usr/bin/python
# -*- coding: utf-8 -*-
import tempfile
with tempfile.TemporaryFile() as fh:
  fh.write(u"Hello World: ä".encode('utf-8'))
  fh.seek(0)
  for line in fh:
    print line.decode('utf-8')

Unicode specifies the character set, not the encoding, so in either case you need a way to specify how to encode the Unicode characters!

answered Sep 18 '22 15:09

Seppo Enarvi

Related questions
                            
                                Breaking out of a recursive function?
                            
                                JQuery get the nth element of array
                            
                                Proper implementation of global configuration
                            
                                what is my HAProxy version?
                            
                                Setting up auto compile for Stylus
                            
                                ActionView::Template::Error ( isn't precompiled):
                            
                                SignalR: $.connection is undefined
                            
                                How to increment a numeric string by +1 with Javascript/jQuery
                            
                                Proxy in phantomjs
                            
                                How to compare software versions using SQL Server?
                            
                                How to retrieve 2D array from xml string resource for Android?
                            
                                How to style WPF tooltip like a speech bubble?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With