I've got a unicode string (s) which I want to write into a file.
In Python 2 I could write:
open('filename', 'w').write(s.encode('utf-8'))
But this fails for Python 3. Apparently, s.encode() returns something of type 'bytes', which the write() function does not accept:
TypeError: must be str, not bytes
Does anyone know how to port the above code to Python 3?
Edit:
Thanks to all of you who proposed using binary mode! Unfortunately, this causes a problem with the \n characters. Is there any way to achieve the same result I had with Python 2 (namely to encode non-ANSI characters in UTF-8 while keeping the OS-specific rendition of \n)?
Thanks!
The encode() method encodes the string, using the specified encoding. If no encoding is specified, UTF-8 will be used.
Python encode() and decode() Functions. Python's encode and decode methods are used to encode and decode the input string, using a given encoding.
Since Python 3.0, strings are stored as Unicode, i.e. each character in the string is represented by a code point. So, each string is just a sequence of Unicode code points. For efficient storage of these strings, the sequence of code points is converted into a set of bytes. The process is known as encoding.
You do not want to muck around with manually encoding each and every piece of data like that! Simply pass the encoding as an argument to open
, like this:
#!/usr/bin/env python3.2
slist = [
"Ca\N{LATIN SMALL LETTER N WITH TILDE}on City",
"na\N{LATIN SMALL LETTER I WITH DIAERESIS}vet\N{LATIN SMALL LETTER E WITH ACUTE}",
"fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade",
"\N{GREEK SMALL LETTER BETA}-globulin"
]
with open("/tmp/sample.utf8", mode="w", encoding="utf8") as f:
for s in slist:
print(s, file=f)
Now if you the file you made, you’ll see that it says:
$ cat /tmp/sample.utf8
Cañon City
naïveté
façade
β-globulin
And you can see that those are the right code points this way:
$ uniquote -x /tmp/sample.utf
Ca\x{F1}on City
na\x{EF}vet\x{E9}
fa\x{E7}ade
\x{3B2}-globulin
See how much easier that is? Let the stream object handle any low-level encoding or decoding for you.
Summary: Don't call encode
or decode
yourself when all you are doing is using them to process a homogeneous stream that's all of it in the same encoding. That's way too much bother for zero gain. Use the encoding
argument just once and for all.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With