I've got a unicode string (s) which I want to write into a file. In Python 2 I could write: <pre class="prettyprint"><code>open('filename', 'w').write(s.encode('utf-8')) </code></pre> But this fails for Python 3. Apparently, s.encode() returns something of type 'bytes', which the write() function does not accept: <pre class="prettyprint"><code>TypeError: must be str, not bytes </code></pre> Does anyone know how to port the above code to Python 3? Edit: Thanks to all of you who proposed using binary mode! Unfortunately, this causes a problem with the \n characters. Is there any way to achieve the same result I had with Python 2 (namely to encode non-ANSI characters in UTF-8 while keeping the OS-specific rendition of \n)? Thanks!

You do not want to muck around with manually encoding each and every piece of data like that! Simply pass the encoding as an argument to <code>open</code>, like this: <pre class="prettyprint"><code>#!/usr/bin/env python3.2 slist = [ "Ca\N{LATIN SMALL LETTER N WITH TILDE}on City", "na\N{LATIN SMALL LETTER I WITH DIAERESIS}vet\N{LATIN SMALL LETTER E WITH ACUTE}", "fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade", "\N{GREEK SMALL LETTER BETA}-globulin" ] with open("/tmp/sample.utf8", mode="w", encoding="utf8") as f: for s in slist: print(s, file=f) </code></pre> Now if you the file you made, you’ll see that it says: <pre class="prettyprint"><code>$ cat /tmp/sample.utf8 Cañon City naïveté façade β-globulin </code></pre> And you can see that those are the right code points this way: <pre class="prettyprint"><code>$ uniquote -x /tmp/sample.utf Ca\x{F1}on City na\x{EF}vet\x{E9} fa\x{E7}ade \x{3B2}-globulin </code></pre> See how much easier that is? Let the stream object handle any low-level encoding or decoding for you. Summary: Don't call <code>encode</code> or <code>decode</code> yourself when all you are doing is using them to process a homogeneous stream that's all of it in the same encoding. That's way too much bother for zero gain. Use the <code>encoding</code> argument just once and for all.

write()-ing an encoded string in Python 3.x

Tags:

python-3.x

unicode

I've got a unicode string (s) which I want to write into a file.

In Python 2 I could write:

open('filename', 'w').write(s.encode('utf-8'))

But this fails for Python 3. Apparently, s.encode() returns something of type 'bytes', which the write() function does not accept:

TypeError: must be str, not bytes

Does anyone know how to port the above code to Python 3?

Edit:

Thanks to all of you who proposed using binary mode! Unfortunately, this causes a problem with the \n characters. Is there any way to achieve the same result I had with Python 2 (namely to encode non-ANSI characters in UTF-8 while keeping the OS-specific rendition of \n)?

Thanks!

959

asked Sep 10 '11 16:09

Tom

1 Answers

You do not want to muck around with manually encoding each and every piece of data like that! Simply pass the encoding as an argument to open, like this:

#!/usr/bin/env python3.2

slist = [
    "Ca\N{LATIN SMALL LETTER N WITH TILDE}on City",
    "na\N{LATIN SMALL LETTER I WITH DIAERESIS}vet\N{LATIN SMALL LETTER E WITH ACUTE}",
    "fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade",
    "\N{GREEK SMALL LETTER BETA}-globulin"
]

with open("/tmp/sample.utf8", mode="w", encoding="utf8") as f:
    for s in slist:
        print(s, file=f)

Now if you the file you made, you’ll see that it says:

$ cat /tmp/sample.utf8
Cañon City
naïveté
façade
β-globulin

And you can see that those are the right code points this way:

$ uniquote -x /tmp/sample.utf 
Ca\x{F1}on City
na\x{EF}vet\x{E9}
fa\x{E7}ade
\x{3B2}-globulin

See how much easier that is? Let the stream object handle any low-level encoding or decoding for you.

Summary: Don't call encode or decode yourself when all you are doing is using them to process a homogeneous stream that's all of it in the same encoding. That's way too much bother for zero gain. Use the encoding argument just once and for all.

answered Oct 22 '22 14:10

tchrist

Related questions
                            
                                How to type a Unicode character using its UTF-8 sequence in Vim?
                            
                                Python print isn't using __repr__, __unicode__ or __str__ for unicode subclass?
                            
                                The length of Arabic letters in Lua
                            
                                Reading files with a BOM in Go
                            
                                UnicodeEncodeError when saving ImageField containing non-ASCII characters in Django admin
                            
                                How can I determine a Unicode character from its name in Python, even if that character is a control character?
                            
                                Unicode characters being drawn differently in iOS5
                            
                                python byte string encode and decode
                            
                                How to handle unicode values in JSON strings?
                            
                                How do I display unicode characters in the Eclipse console window?
                            
                                How to Convert Arabic Characters to Unicode Using PHP
                            
                                cannot refer to unexported name m.β
                            
                                What is the difference between u' ' prefix and unicode() in python?
                            
                                Only fill 60 percent star using css width
                            
                                Characters appear as question marks in MySQL
                            
                                Python: Split unicode string on word boundaries
                            
                                Windows batch: Unicode parameters for (robo) copy command
                            
                                Internationalization in MFC
                            
                                unicode hello world for C?
                            
                                Python: how to convert from Windows 1251 to Unicode?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With