I am using Python to read a text file of data line by line. One of the lines contains a degree symbol. I want to alter this part of the string. My script uses line = line.replace("TEMP [°C]", "TempC")
. My code stops at this line but does not change the sting at all nor does it throw an error. Clearly there is something about my replace such that the script does not see the 'TEMP [°C]' as existing in my string.
In order to insert the degree sign in my script I had to change the encoding to UTF-8 in my IDE file settings. I have included the following text at the top of my script.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
How do I replace 'TEMP [°C]' with 'TempC'?
I am using Windows 7 and Python 2.7 with Komodo IDE 5.2
I have tried running the suggested code in a Python Shell in Komodo and that changed the file.
# -*- coding: utf-8 -*-
line = "hello TEMP [°C]"
line = line.replace("TEMP [°C]", "TempC")
print(line)
hello TempC
This suggested code in a Python Shell in Komodo returned this.
line = "TEMP [°C]"
line = line.replace(u"TEMP [°C]", "TempC")
Traceback (most recent call last):
File "<console>", line 0, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 6: ordinal not in range(128)
None of these suggestions worked when reading my text file though.
Based on your symptoms, your Python str
literals end up as their utf-8
encodings, so when you type:
"TEMP [°C]"
you actually get:
'TEMP [\xc2\xb0C]'
Your file is some other encoding (e.g. latin-1
or cp1252
), and since you're reading it via plain open
, you're getting back undecoded str
. But in latin-1
and cp1252
encoding, the str
is 'TEMP [\xb0C]'
(note lack of \xc2
), so str
comparison doesn't consider the two strings equivalent.
The best fix is to replace your use of open
with io.open
, which uses the Python 3 version of open
that can seamlessly decode using a given encoding to produce canonical unicode
representations, and similarly, to use unicode
literals instead of str
in (to Python) unknown encoding, so there is no disagreement on the correct way to represent a degree symbol (in unicode
, there is one, and only one, representation):
import io
with io.open('myfile.txt', encoding='cp1252') as f:
for line in f:
line = line.replace(u"TEMP [°C]", u"TempC")
As you describe in your edits, your file is likely cp1252
(your editor says it's ANSI, which is just a dumb way to describe cp1252
), thus the chosen encoding
.
Note: If you're going to use unicode
consistently throughout your program (a decent idea if you deal with non-ASCII data), you can make that the default:
from __future__ import unicode_literals
# All string literals are unicode literals unless prefixed with b, as on Python 2
from io import open # open is now Python 3's open
# No need to qualify with `io.` for `open`, nor put `u` in front of Unicode text
with open('myfile.txt', encoding='cp1252') as f:
for line in f:
line = line.replace("TEMP [°C]", "TempC")
Really you should just move to Python 3, where this whole "unicode
and str
try to work together and often fail" thing was resolved by splitting the two types completely.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With