Writing and then reading a string in file encoded in latin1

Tags:

Here are 2 code samples, Python3 : the first one writes two files with latin1 encoding :

s='On écrit ça dans un fichier.'
with open('spam1.txt', 'w',encoding='ISO-8859-1') as f:
    print(s, file=f)
with open('spam2.txt', 'w',encoding='ISO-8859-1') as f:
    f.write(s)

The second one reads the same files with the same encoding :

with open('spam1.txt', 'r',encoding='ISO-8859-1') as f:
    s1=f.read()
with open('spam2.txt', 'r',encoding='ISO-8859-1') as f:
    s2=f.read()

Now, printing s1 and s2 I get

On Ã©crit Ã§a dans un fichier.

instead of the initial "On écrit ça dans un fichier."

What is wrong ? I also tried with io.open but I miss something. The funny part is that I had no such problem with Python2.7 and its str.decode method which is now gone...

Could someone help me ?

870

asked Jul 22 '13 14:07

François Coulombeau

1 Answers

Your data was written out as UTF-8:

>>> 'On écrit ça dans un fichier.'.encode('utf8').decode('latin1')
'On Ã©crit Ã§a dans un fichier.'

This either means you did not write out Latin-1 data, or your source code was saved as UTF-8 but you declared your script (using a PEP 263-compliant header to be Latin-1 instead.

If you saved your Python script with a header like:

# -*- coding: latin-1 -*-

but your text editor saved the file with UTF-8 encoding instead, then the string literal:

s='On écrit ça dans un fichier.'

will be misinterpreted by Python as well, in the same manner. Saving the resulting unicode value to disk as Latin-1, then reading it again as Latin-1 will preserve the error.

To debug, please take a close look at print(s.encode('unicode_escape')) in the first script. If it looks like:

b'On \\xc3\\xa9crit \\xc3\\xa7a dans un fichier.'

then your source code encoding and the PEP-263 header are disagreeing on how the source code should be interpreted. If your source code is correctly decoded the correct output is:

b'On \\xe9crit \\xe7a dans un fichier.'

If Spyder is stubbornly ignoring the PEP-263 header and reading your source as Latin-1 regardless, avoid using non-ASCII characters and use escape codes instead; either using \uxxxx unicode code points:

s = 'On \u00e9crit \u007aa dans un fichier.'

or \xaa one-byte escape codes for code-points below 256:

s = 'On \xe9crit \x7aa dans un fichier.'

answered Oct 01 '22 19:10

Martijn Pieters

Related questions
                            
                                `pyparsing`: iterating over `ParsedResults`
                            
                                Howto download file from Drive API using Python script
                            
                                urllib2.urlopen will hang forever despite of timeout
                            
                                Python and proxy - urllib2.URLError: <urlopen error [Errno 110] Connection timed out>
                            
                                Is there an up-to-date fast YAML parser with python bindings?
                            
                                What is the most efficient way to insert nodes into a neo4j database using cypher
                            
                                How can I change my tor process' endpoint in stem?
                            
                                Tkinter .after method freezing window?
                            
                                python - Using argparse, pass an arbitrary string as an argument to be used in the script
                            
                                XORing file with multi-byte key
                            
                                How to print raw html string using urllib3？
                            
                                Best way to compare two large sets of strings in Python
                            
                                Send/receive Packets with TCP sockets
                            
                                Why does Django's User Model set the email field as non-unique? [duplicate]
                            
                                Gunicorn and django settings module
                            
                                Python GUI programming using drag and drop, also incorporating stdout redirect
                            
                                Is it possible for a MongoDB connection to timeout in Python?
                            
                                How do I use the book functions (e.g. concoordance) in NLTK?
                            
                                Get all elements in a list where the value is equal to certain value
                            
                                Where are the variables in Python stored? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Writing and then reading a string in file encoded in latin1

Tags:

python

io

latin1

François Coulombeau

People also ask

1 Answers

Martijn Pieters

Recent Activity

Donate For Us