Ok I've got a bit of an interesting problem on my hands. Here's a bit of background:
I'm writing a media library service implementation that will serve URLs to a front end flash player. The client wants to be able to push content into the service by uploading files with some metadata into an FTP folder - I have control of the metadata schema. The service watching this folder will pick up any new files, copy them to a "content" folder with, then push the meta data and the url's to the content down the content service and into the database.
The content service isn't a problem, done. Watching an FTP folder is.
My current implementation uses the FileSystemWatcher object with a filter for xml files.
There may be more than one file for each content item e.g. high, med, low quality videos.
I plan to enforce either by process or by a tool that the content is organised into it's own folders just for brevity but this isn't really an issue.
The xml file will look a bit like this:
<media>
<meta type="video">
<name>The Name Displayed</name>
<heading>The title of the video</heading>
<description>
A lengthy description about the video..
</description>
<length>00:00:19</length>
</meta>
<files>
<video file="somevideo.flv" quality="low"/>
<video file="somevideo.flv" quality="medium"/>
<video file="somevideo.flv" quality="high"/>
</files>
</media>
So when a new file is created the FileSystemWatcher.Created event fires. I've got a seperate thread running to process the content which shares a queue with the main service process (don't worry it's using the producer consumer pattern as detailed here: http://msdn.microsoft.com/en-us/library/yy12yx1f.aspx).
This all works fine, but now I'm running into edge cases left right and centre!
I've taken into account that the videos will take longer to upload so the processor will try to get an exclusive lock, if that fails it will move the item to the back of the queue and move onto the next item.
Can anyone recommend a best practice for this scenario? Is the filesystemwatcher a good idea or should the service just periodically scan the folder?
Edit: Just to give you some idea of scale. We're talking 10s of 1000s of items in total. Probably uploaded in large chunks.
FileSystemWatcher is a pragmatic way of getting early visibility of a file drop, but there are a lot of edge-cases that can cause missed events. You would still need a periodic sweep to be sure.
To avoid as many mutex issues; could the clients upload as foo.xml.upload, and then rename to foo.xml one there? That would avoid a lengthy locked file... (you just ignore the .upload ones).
Also: beware "not invented here"... there are many existing utilities that monitor folders; BizTalk for (an overkill) example. I'm not saying "don't do it yourself", but at least consider pre-canned options.
A common pitfall when watching for a file creation is that you may recieve your event before the upload has completed, so you end up processing a file that is incomplete.
Of course you can check for that, since the xml of an incomplete file will not be well formed. But a better solution IMO is to have the uploader write to a temp file, and after completion of the upload have it rename the file to *.xml.
As for the FileSystemWatcher itself, well... I have come to distrust it when it is used on a network share (delete
event never got fired). So I wrote my own 'watcher' that has a similar interface to the FSW, but is just polling the directory for file creation/deletion/change. Not as elegant as the FSW, but more reliable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With