How to use encoding utf-8.py instead of cp1252.py in Python

Question

I have written a very small program that copies all lines of one file to another file - when the line contains a certain string. Here is the complete source:

f_in = open("all.txt", "r")
f_out = open("all.out", "w")

for line in f_in:
    if "<title>" in line:
        f_out.write(line)

f_out.close()
f_in.close()

That works very well, until it comes to an utf-8 character in all.txt. Then it fails saying:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 7102: character map to <undefined>

Now I did a BAD workaround: In the directory \Python\Lib\encodings I have copied utf-8.py and renamed it to cp1252.py.

From now on - the little program above runs with no problem. But there must be a more elegant solution. Can you tell me what is needed to make Phyton use utf-8.py instead of cp1252.py?

I am sure this is possible with no heavy conversion and decoding and whatever - just tell Python to use another decoding instead of cp1252.py.

NostraDavid · Accepted Answer

Run python with the -X utf8 option.

I had the following error:

UnicodeEncodeError: 'charmap' codec can't encode character '\u0141' in position 10: character maps to <undefined>

And I used with open(filepath, "r+", encoding="utf-8") as yaml_file: (explicit encoding), as one would expect, but windows was being poopy and kept using cp1252.py, which was driving me up the wall because it kept causing the error above.

Anyway, running python -X utf8 .\script.py fixed my woes.

Martijn Pieters · Answer

Use io.open() to read and write Unicode values instead:

import io

with io.open('all.txt', 'r', encoding='utf8') as f_in:
    with io.open('all.out', 'w', encoding='utf8') as f_out:
        for line in f_in:
            if u"<title>" in line:
                f_out.write(line)

Renaming codec files is the last thing you should do.

How to use encoding utf-8.py instead of cp1252.py in Python

Tags:

python

encoding

utf-8

user1131536

2 Answers

NostraDavid

Martijn Pieters

Recent Activity

Donate For Us

How to use encoding utf-8.py instead of cp1252.py in Python

Tags:

python

encoding

utf-8

user1131536

2 Answers

NostraDavid

Martijn Pieters

Related questions

Recent Activity

Donate For Us