Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: How to move a file with unicode filename to a unicode folder

I'm having hell with moving a unicode named file between unicode named folders in a Python script under Windows...

What syntax would you use to find all files of type *.ext in a folder and move them to a relative location?

Assume files and folders are unicode.

like image 441
Jonathan Livni Avatar asked Apr 02 '11 13:04

Jonathan Livni


2 Answers

The basic problem is the unconverted mix between Unicode and byte strings. The solutions can be converting to a single format or avoiding the problems using some trickery. All of my solutions include the glob and shutil standard library.

For the sake of example, I have some Unicode filenames ending with ods, and I want to move them to the subdirectory called א (Hebrew Aleph, a unicode character).

First solution - express directory name as byte string:

>>> import glob
>>> import shutil
>>> files=glob.glob('*.ods')      # List of Byte string file names
>>> for file in files:
...     shutil.copy2(file, 'א')   # Byte string directory name
... 

Second solution - convert the file names to Unicode:

>>> import glob
>>> import shutil
>>> files=glob.glob(u'*.ods')     # List of Unicode file names
>>> for file in files:
...     shutil.copy2(file, u'א')  # Unicode directory name

Credit to the Ezio Melotti, Python bug list.

Third solution - avoiding destination Unicode directory name

Although this isn't the best solution in my opinion, there is a nice trick here that's worth mentioning.

Change your directory to the destination directory using os.getcwd(), and then copy the files to it by referring to it as .:

# -*- coding: utf-8 -*-
import os
import shutil
import glob

os.chdir('א')                   # CD to the destination Unicode directory
print os.getcwd()               # DEBUG: Make sure you're in the right place
files=glob.glob('../*.ods')     # List of Byte string file names
for file in files:
        shutil.copy2(file, '.') # Copy each file
# Don't forget to go back to the original directory here, if it matters

Deeper explanation

The straightforward approach shutil.copy2(src, dest) fails because shutil concatenates a unicode with ASCII string without conversions:

>>> files=glob.glob('*.ods')
>>> for file in files:
...     shutil.copy2(file, u'א')
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/lib/python2.6/shutil.py", line 98, in copy2
    dst = os.path.join(dst, os.path.basename(src))
  File "/usr/lib/python2.6/posixpath.py", line 70, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd7 in position 1: 
                    ordinal not in range(128)

As seen before, this can be avoided when using 'א' instead of the Unicode u'א'

It this a bug?

In my opinion, this is bug, because Python cannot expect basedir names to be always str, not unicode. I have reported this as an issue in the Python buglist, and waiting for responses.

Further reading

Python's official Unicode HOWTO

like image 146
Adam Matan Avatar answered Nov 12 '22 04:11

Adam Matan


Use Unicode string everywhere:

# -*- coding: utf-8 -*-
# source code ^^ encoding; it might be different from sys.getfilesystemencoding()
import glob
import os

srcdir = u'مصدر الدليل' # <-- unicode string
dstdir = os.path.join('..', u'κατάλογο προορισμού') # relative path
for path in glob.glob(os.path.join(srcdir, u'*.ext')):
    newpath = os.path.join(dstdir, os.path.basename(path))
    os.rename(path, newpath) # move file or directory; assume the same filesystem

There are many subtle details in moving files; see shutit.copy* functions. You could use one that is appropriate in your particular case and remove source files on success e.g., via os.remove().

like image 22
jfs Avatar answered Nov 12 '22 03:11

jfs