Can anyone explain what causes this for better understanding of the environment?
emacs, unix
input:
with open("example.txt", "r") as f:
for files in f:
print files
split = files.split()
print split
output:
Hello world
['Hello', 'world']
Hello wörld
['Hello', 'w\xf6rld']
Python is printing the string representation, which includes a non-printable byte. Non-printable bytes (anything outside the ASCII range or a control character) is displayed as an escape sequence.
The point is that you can copy that representation and paste it into Python code or into the interpreter, producing the exact same value.
The \xf6
escape code represents a byte with hex value F6, which when interpreted as a Latin-1 byte value, is the ö
character.
You probably want to decode that value to Unicode to handle the data consistently. If you don't yet know what Unicode really is, or want to know anything else about encodings, see:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
The Python Unicode HOWTO
Pragmatic Unicode by Ned Batchelder
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With