Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert python filenames to unicode

Tags:

python

unicode

I am on python 2.6 for Windows.

I use os.walk to read a file tree. Files may have non-7-bit characters (German "ae" for example) in their filenames. These are encoded in Pythons internal string representation.

I am processing these filenames with Python library functions and that fails due to wrong encoding.

How can I convert these filenames to proper (unicode?) python strings?

I have a file "d:\utest\ü.txt". Passing the path as unicode does not work:

>>> list(os.walk('d:\\utest'))
[('d:\\utest', [], ['\xfc.txt'])]
>>> list(os.walk(u'd:\\utest'))
[(u'd:\\utest', [], [u'\xfc.txt'])]
like image 476
Bernd Avatar asked Jun 27 '09 06:06

Bernd


People also ask

How are unicode strings translated into bytes in Python?

The rules for translating a Unicode string into a sequence of bytes are called a character encoding, or just an encoding. The first encoding you might think of is using 32-bit integers as the code unit, and then using the CPU’s representation of 32-bit integers. In this representation, the string “Python” might look like this:

Does Python support Unicode characters?

Python’s Unicode Support¶. Now that you’ve learned the rudiments of Unicode, we can look at Python’s Unicode features. Since Python 3.0, the language’s str type contains Unicode characters, meaning any string created using "unicode rocks!", 'unicode rocks!', or the triple-quoted string syntax is stored as Unicode.

How do I change the encoding of a file with Unicode?

When opening a file for reading or writing, you can usually just provide the Unicode string as the filename, and it will be automatically converted to the right encoding for you: Functions in the os module such as os.stat () will also accept Unicode filenames.

What is the difference between Unicode and byte file names?

If you pass a Unicode string as the path, filenames will be decoded using the filesystem’s encoding and a list of Unicode strings will be returned, while passing a byte path will return the filenames as bytes. For example, assuming the default filesystem encoding is UTF-8, running the following program:


1 Answers

If you pass a Unicode string to os.walk(), you'll get Unicode results:

>>> list(os.walk(r'C:\example'))          # Passing an ASCII string
[('C:\\example', [], ['file.txt'])]
>>> 
>>> list(os.walk(ur'C:\example'))        # Passing a Unicode string
[(u'C:\\example', [], [u'file.txt'])]
like image 175
RichieHindle Avatar answered Sep 28 '22 12:09

RichieHindle