Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use encoding utf-8.py instead of cp1252.py in Python

I have written a very small program that copies all lines of one file to another file - when the line contains a certain string. Here is the complete source:

f_in = open("all.txt", "r")
f_out = open("all.out", "w")

for line in f_in:
    if "<title>" in line:
        f_out.write(line)

f_out.close()
f_in.close()

That works very well, until it comes to an utf-8 character in all.txt. Then it fails saying:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 7102: character map to <undefined>

Now I did a BAD workaround: In the directory \Python\Lib\encodings I have copied utf-8.py and renamed it to cp1252.py.

From now on - the little program above runs with no problem. But there must be a more elegant solution. Can you tell me what is needed to make Phyton use utf-8.py instead of cp1252.py?

I am sure this is possible with no heavy conversion and decoding and whatever - just tell Python to use another decoding instead of cp1252.py.

like image 501
user1131536 Avatar asked Sep 01 '25 22:09

user1131536


2 Answers

Run python with the -X utf8 option.

I had the following error:

UnicodeEncodeError: 'charmap' codec can't encode character '\u0141' in position 10: character maps to <undefined>

And I used with open(filepath, "r+", encoding="utf-8") as yaml_file: (explicit encoding), as one would expect, but windows was being poopy and kept using cp1252.py, which was driving me up the wall because it kept causing the error above.

Anyway, running python -X utf8 .\script.py fixed my woes.

like image 92
NostraDavid Avatar answered Sep 04 '25 03:09

NostraDavid


Use io.open() to read and write Unicode values instead:

import io

with io.open('all.txt', 'r', encoding='utf8') as f_in:
    with io.open('all.out', 'w', encoding='utf8') as f_out:
        for line in f_in:
            if u"<title>" in line:
                f_out.write(line)

Renaming codec files is the last thing you should do.

like image 44
Martijn Pieters Avatar answered Sep 04 '25 03:09

Martijn Pieters