Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Windows cannot stat files with invalid characters

I'm trying to make a quick Python script to rename a bunch of files. These files were made in a Linux system on this NTFS drive, but I'm now on Windows. The naming convention looks like this:

Screenshot at 2016-12-11 21:12:56.png

The : character is illegal in Windows filenames, so the behaviour of this script is a little strange to me.

for i in os.listdir("."):
    print(i)
    x = i.replace(":", "-")
    comm = """mv "{}" "{}" """.format(i, x)
    os.system(comm)

In the above code, the print(i) prints the filenames happily. However when I try to run os.system(comm) to rename my files, I get this error:

mv: cannot stat ‘Screenshot at 2016-12-24 14:54:57.png’: No such file or directory

Firstly, I find it a little strange that Python under Windows can tell that these naughty files exist, but isn't able to actually move them. Secondly, what's the best way to get around this issue?

I've also tried shutil.move() and os.rename() with no luck. This SO question seems to discuss the issue, but seems more concerned with prevention than fixing it. I could obviously switch back to Linux and fix it, but I'm wondering if I can't fix it on Windows.

like image 516
Daniel Porteous Avatar asked Dec 24 '16 10:12

Daniel Porteous


1 Answers

You can find them because they're in the directory. You can't access them, because the colon symbol is parsed differently in a path. This means the files cannot be reached by common path functions including MoveFile. You basically have two options: finding a method that doesn't rely on the name, like OpenFileById, or finding an alternate name for the file, like dir /x. The latter gets you the short name (8.3), which should not contain any colons. I don't know if there's a ready function to access those names from Python, so the shortest clear (to me) workaround is executing dir /x and parsing its output.

I think paths relative to directory descriptors is as close as Python's standard library gets to the first method, but I don't know if it would be enough. The underlying FindFirstFile/FindNextFile functions do produce both names in WIN32_FIND_DATA (cFileName and cAlternateFileName), but Python expects the first one to be valid. Either method would also have made sense in PowerShell, but it looks like it is wholly unaware of short names and also tracks files by name, not IDs. Otherwise FileInfo.MoveTo would've done the trick neatly.

To prevent this situation in the first place, ntfs-3g supports a windows_names option. This causes it to balk when trying to create the files.

Conclusion: as discussed in https://superuser.com/questions/31587/how-to-force-windows-to-rename-a-file-with-a-special-character there is no clear solution. All of my attempted methods (and a handful others) have been discussed there. Probably the least messy option is to mount the disk in Linux again and rename from there; the filesystem is technically corrupt because the characters are invalid, but Microsoft's repair solution is deletion, not renaming.

Cygwin merely emulated the colon by using a private unicode character (':'+0xf000).

like image 177
Yann Vernier Avatar answered Oct 22 '22 14:10

Yann Vernier