I have written a very small program that copies all lines of one file to another file - when the line contains a certain string. Here is the complete source:
f_in = open("all.txt", "r")
f_out = open("all.out", "w")
for line in f_in:
if "<title>" in line:
f_out.write(line)
f_out.close()
f_in.close()
That works very well, until it comes to an utf-8 character in all.txt. Then it fails saying:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 7102: character map to <undefined>
Now I did a BAD workaround: In the directory \Python\Lib\encodings I have copied utf-8.py and renamed it to cp1252.py.
From now on - the little program above runs with no problem. But there must be a more elegant solution. Can you tell me what is needed to make Phyton use utf-8.py instead of cp1252.py?
I am sure this is possible with no heavy conversion and decoding and whatever - just tell Python to use another decoding instead of cp1252.py.
Run python with the -X utf8
option.
I had the following error:
UnicodeEncodeError: 'charmap' codec can't encode character '\u0141' in position 10: character maps to <undefined>
And I used with open(filepath, "r+", encoding="utf-8") as yaml_file:
(explicit encoding), as one would expect, but windows was being poopy and kept using cp1252.py
, which was driving me up the wall because it kept causing the error above.
Anyway, running python -X utf8 .\script.py
fixed my woes.
Use io.open()
to read and write Unicode values instead:
import io
with io.open('all.txt', 'r', encoding='utf8') as f_in:
with io.open('all.out', 'w', encoding='utf8') as f_out:
for line in f_in:
if u"<title>" in line:
f_out.write(line)
Renaming codec files is the last thing you should do.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With