Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python lists with scandinavic letters

Can anyone explain what causes this for better understanding of the environment?

emacs, unix

input:

with open("example.txt", "r") as f:
    for files in f:
        print files
        split = files.split()
        print split

output:

Hello world
['Hello', 'world']
Hello wörld
['Hello', 'w\xf6rld']
like image 301
jester112358 Avatar asked Dec 15 '22 09:12

jester112358


1 Answers

Python is printing the string representation, which includes a non-printable byte. Non-printable bytes (anything outside the ASCII range or a control character) is displayed as an escape sequence.

The point is that you can copy that representation and paste it into Python code or into the interpreter, producing the exact same value.

The \xf6 escape code represents a byte with hex value F6, which when interpreted as a Latin-1 byte value, is the ö character.

You probably want to decode that value to Unicode to handle the data consistently. If you don't yet know what Unicode really is, or want to know anything else about encodings, see:

  • The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

  • The Python Unicode HOWTO

  • Pragmatic Unicode by Ned Batchelder

like image 141
Martijn Pieters Avatar answered Dec 29 '22 19:12

Martijn Pieters