I am familiar with the FileSystemWatcher class, and have tested using this, alternatively I have tested using a fast loop and doing a directory listing of files of type in a directory. In this particular case they are zip compressed SDF files, I need to decompress, open, and query.
The problem is that when a large file is put in a directory, sometimes that takes time, such as it being downloaded, or copied from a network location, etc...
When the FileSystemWatcher raises an OnChange event, I have a handle to the ChangeType and on these types of operations the Create is immediate, while the file is still not completely copied to the location.
Likewise using the loop, I see a file is there, before the whole file is there.
The FileSystemWatcher raises several change events, one after create, and then one or more during the copy, nothing that says This file is now complete
So if I am expecting files of a type, to be placed in a directory ultimately to read and processed, with no knowledge of their transport mechanism, and no knowledge of their final size...
How do I know when the file is ready to actually be processed other than with using error control as a workflow control (albeit the error control is there anyway as it should be)? This just seems like a bad way to have to handle this, as sometimes the error control may actually be representing a legitimate issue, sometimes it may just be that the file is not completely written, and I do not see any real safe way to differentiate.
I despise anticipated error, but realize that is has its place like sockets, nothing guarantees a check for open does not change before an attempt to read/write. But I do avoid it at all costs.
This particular one troubles me mostly because of the ambiguity of the message that will be produced. There is a conflict queue for files that legitimately error because they did not come across entirely or are otherwise corrupt, I do not want otherwise good files going there. Getting more granular to detect this specific case will be almost impossible.
edit: I know I can do this... And I have read the other SA articles concerning others doing the same thing. (And I know this method is both crude and blocking, it is just an example.)
private static void OnChanged(object source, FileSystemEventArgs e)
{
if (e.ChangeType == WatcherChangeTypes.Created)
{
bool ready = false;
while (!ready)
{
try
{
using (FileStream fs = new FileStream(e.FullPath, FileMode.Open))
{
Console.WriteLine(String.Format("{0} - {1}", e.FullPath, fs.Length));
}
ready = true;
}
catch (IOException)
{
ready = false;
}
}
}
}
What I am trying to find out is this definitively the only way, is there no other component, or some hook to the file system that will actually do this with a proper event?
The only way to tell is to open the file with FileShare.Read
. That will always fail if the process is still writing to the file and hasn't closed it yet. There is otherwise no mechanism to know anything at all about which particular process is doing the writing, FSW operates at the file system device driver level and doesn't know anything about what process is performing the operation. Could be more than one.
That will very often fail the first time you try, FSW is very efficient. In general you have no idea how much time the process will take, it of course depends on how it is written and might leave the file opened for a while. Could be hours or days, a log file would be an example.
So you need a re-try mechanism, it should have an exponential back-off algorithm to increase the re-try delays between attempts. Start it off at, say, a half second delay and keep increasing that delay when it fails. This needs to be done in a worker thread, not the FSW callback. Use a thread-safe queue to pass the path of the file from the FSW callback to the worker thread. Also in general a good strategy to deal with the multiple FSW notifications you get.
Watch out for startup effects, you of course missed any notification before you started running so there might be a load of files that are waiting for work. And watch out for Heisenbugs, whatever you do with the file might cause another process to fall over. Much like this process did to yours :)
Consider that a batch-style program that you periodically run with the task scheduler could be an easier alternative.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With