I am developing a piece of C code that uses ReadDirectoryChangesW() to monitor changes under a directory in Windows. I have read the related MSDN entries for ReadDirectoryChangesW() and the FILE_NOTIFY_INFORMATION structure, as well as several other pieces of documentation. At this point I have managed to monitor multiple directories with no apparent problems in the monitoring itself. The problem is that the filenames put in the FILE_NOTIFY_INFORMATION structure by this function are not canonical.
According to MSDN they can be in either long or short form. I have found several posts which suggest caching both short and long pathnames to handle this case. Unfortunately, according to my own testing on a Windows 7 system this is not sufficient to eliminate the issue, because there are not just two alternatives for each filename. The problem is that in a pathname EACH COMPONENT can be in either long or short form. The following pathnames could all refer to the same file:
c:\PROGRA~1\MYPROG~1\MYDATA~1.TXT
c:\PROGRA~1\MYPROG~1\MyDataFile.txt
c:\PROGRA~1\MyProgram\MYDATA~1.TXT
c:\PROGRA~1\MyProgram\MyDataFile.txt
c:\Program Files\MYPROG~1\MYDATA~1.TXT
...
and as far as I can tell from my testing using cmd.exe they are all perfectly acceptable. Essentially, the number of valid pathnames for each file rises exponentialy with the number of components in its pathname.
Unfortunately, ReadDirectoryChangesW() seems to fill in its output buffer with the filenames as provided to the system call that causes each operation. For example if you use cmd.exe commands to create, rename, delete e.t.c. files, the FILE_NOTIFY_INFORMATION will contain the filenames as specified at the command line.
Now, in most cases I could use GetLongPathName() and friends to get a unique path for my use. Unfortunately that cannot be done when deleting files - by the time I get the notification, the file is already gone and the Get*PathName() functions will not work.
At the moment I am thinking about using more extensive caching to determine which alternative pathnames are used by applications for each file, which would handle any case, except for the one where someone decides to delete a file out of the blue using an unseen mixed pathname. And I am thinking about creative data mining from the parent directory modification events and falling back to checking the actual directory for that case.
Any suggestions for an easier way to do this ?
PS1: While Change Journals would deal with this effectively (I hope) I do not believe I can use them, due to their ties to NTFS and the lack of administrator priviledges for my application. I'd rather not go there, unless I am absolutely forced to.
PS2: Please, keep in mind that I code mainly on Unix, so be gentle...
You don't need to cache every combination. It will do if you cache each subpath to be able to convert it to the long form. for example store this:
C:\PROGRA~1 => c:\Program Files
c:\Program Files\MYPROG~1 => c:\Program Files\MyProgram
c:\Program Files\MyProgram\MYDATA~1.TXT => c:\Program Files\MyProgram\MyDataFile.txt
c:\Program Files\MyProgram\MYDATA~2.TXT => c:\Program Files\MyProgram\MyDataFile2.txt
Now if you get a notification of c:\PROGRA~1\MYPROG~1\MYDATA~1.TXT
, split it at every \
, and lookup each part for it's long form.
Don't forget that MyDataFile.txt
and MYDATAFILE.TXT
also point to the same file. So compare case-insensitive or convert everything to uppercase.
And if c:\PROGRA~1\MYPROG~1\MYDATA~1.TXT
is deleted, you might still use GetLongPathName()
on c:\PROGRA~1\MYPROG~1
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With