I am on python 2.6 for Windows.
I use os.walk to read a file tree. Files may have non-7-bit characters (German "ae" for example) in their filenames. These are encoded in Pythons internal string representation.
I am processing these filenames with Python library functions and that fails due to wrong encoding.
How can I convert these filenames to proper (unicode?) python strings?
I have a file "d:\utest\ü.txt". Passing the path as unicode does not work:
>>> list(os.walk('d:\\utest'))
[('d:\\utest', [], ['\xfc.txt'])]
>>> list(os.walk(u'd:\\utest'))
[(u'd:\\utest', [], [u'\xfc.txt'])]
The rules for translating a Unicode string into a sequence of bytes are called a character encoding, or just an encoding. The first encoding you might think of is using 32-bit integers as the code unit, and then using the CPU’s representation of 32-bit integers. In this representation, the string “Python” might look like this:
Python’s Unicode Support¶. Now that you’ve learned the rudiments of Unicode, we can look at Python’s Unicode features. Since Python 3.0, the language’s str type contains Unicode characters, meaning any string created using "unicode rocks!", 'unicode rocks!', or the triple-quoted string syntax is stored as Unicode.
When opening a file for reading or writing, you can usually just provide the Unicode string as the filename, and it will be automatically converted to the right encoding for you: Functions in the os module such as os.stat () will also accept Unicode filenames.
If you pass a Unicode string as the path, filenames will be decoded using the filesystem’s encoding and a list of Unicode strings will be returned, while passing a byte path will return the filenames as bytes. For example, assuming the default filesystem encoding is UTF-8, running the following program:
If you pass a Unicode string to os.walk()
, you'll get Unicode results:
>>> list(os.walk(r'C:\example')) # Passing an ASCII string
[('C:\\example', [], ['file.txt'])]
>>>
>>> list(os.walk(ur'C:\example')) # Passing a Unicode string
[(u'C:\\example', [], [u'file.txt'])]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With