How to read a "C source, ISO-8859 text"

Question

I have this myfile (which I have pasted, I hope the relevant data with the problems has survived the copy/pasting). I try to read that file with:

import codecs
codecs.open('myfile', 'r', 'utf-8').read()

But this gives:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xe5 in position 7128: invalid continuation byte

If I check the file:

» file myfile
myfile: C source, ISO-8859 text

How can I read that kind of file (ISO-8859) in python?
In the general case, how can I know how a file is encoded?

Lots of times I am dealing with files which have not been generated by me (system files, random files downloaded from the internet, random files contributed by providers, customers, ...): those files do not provide a clue of the encoding they are using. Being in a multi-cultural environment (Europe), it is difficult to know how those files have been encoded. Most of the times, even the person providing the files has no clue about encoding, which can be happening behind the scenes by the editor/tool of choice. How to be sure about the encoding being used, on a file-by-file basis?

David Michael Gang · Accepted Answer

With python 3.3 you can use the built in open function

open("myfile",encoding="ISO-8859-1")

Martijn Pieters · Answer

You change the codec in the open() command; the ISO-8859 standard has multiple codecs, I picked Latin-1 for you here, but you may need to pick another one:

codecs.open('myfile', 'r', 'iso-8859-1').read()

See the codecs module for a list of valid codecs. Judging by the pastie data, iso-8859-1 is the correct codec to use, as it is suited for Scandinavian text.

Generally, without other sources, you cannot know what codec a file uses. At best, you can guess (which is what file does).

How to read a "C source, ISO-8859 text"

Tags:

python

unicode

blueFast

2 Answers

David Michael Gang

Martijn Pieters

Recent Activity

Donate For Us

How to read a "C source, ISO-8859 text"

Tags:

python

unicode

blueFast

2 Answers

David Michael Gang

Martijn Pieters

Related questions

Recent Activity

Donate For Us