Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Graceful File Reading without Locking

Whiteboard Overview

The images below are 1000 x 750 px, ~130 kB JPEGs hosted on ImageShack.

  • Internal
  • Global

Additional Information

I should mention that each user (of the client boxes) will be working straight off the /Foo share. Due to the nature of the business, users will never need to see or work on each other's documents concurrently, so conflicts of this nature will never be a problem. Access needs to be as simple as possible for them, which probably means mapping a drive to their respective /Foo/username sub-directory.

Additionally, no one but my applications (in-house and the ones on the server) will be using the FTP directory directly.


Possible Implementations

Unfortunately, it doesn't look like I can use off the shelf tools such as WinSCP because some other logic needs to be intimately tied into the process.

I figure there are two simple ways for me to accomplishing the above on the in-house side.

  1. Method one (slow):

    • Walk the /Foo directory tree every N minutes.

    • Diff with previous tree using a combination of timestamps (can be faked by file copying tools, but not relevant in this case) and check-summation.

    • Merge changes with off-site FTP server.

  2. Method two:

    • Register for directory change notifications (e.g., using ReadDirectoryChangesW from the WinAPI, or FileSystemWatcher if using .NET).

    • Log changes.

    • Merge changes with off-site FTP server every N minutes.

I'll probably end up using something like the second method due to performance considerations.


Problem

Since this synchronization must take place during business hours, the first problem that arises is during the off-site upload stage.

While I'm transferring a file off-site, I effectively need to prevent the users from writing to the file (e.g., use CreateFile with FILE_SHARE_READ or something) while I'm reading from it. The internet upstream speeds at their office are nowhere near symmetrical to the file sizes they'll be working with, so it's quite possible that they'll come back to the file and attempt to modify it while I'm still reading from it.


Possible Solution

The easiest solution to the above problem would be to create a copy of the file(s) in question elsewhere on the file-system and transfer those "snapshots" without disturbance.

The files (some will be binary) that these guys will be working with are relatively small, probably ≤20 MB, so copying (and therefore temporarily locking) them will be almost instant. The chances of them attempting to write to the file in the same instant that I'm copying it should be close to nil.

This solution seems kind of ugly, though, and I'm pretty sure there's a better way to handle this type of problem.

One thing that comes to mind is something like a file system filter that takes care of the replication and synchronization at the IRP level, kind of like what some A/Vs do. This is overkill for my project, however.


Questions

This is the first time that I've had to deal with this type of problem, so perhaps I'm thinking too much into it.

I'm interested in clean solutions that don't require going overboard with the complexity of their implementations. Perhaps I've missed something in the WinAPI that handles this problem gracefully?

I haven't decided what I'll be writing this in, but I'm comfortable with: C, C++, C#, D, and Perl.

like image 460
fulcrum Avatar asked Feb 19 '11 22:02

fulcrum


1 Answers

After the discussions in the comments my proposal would be like so:

  • Create a partition on your data server, about 5GB for safety.
  • Create a Windows Service Project in C# that would monitor your data driver / location.
  • When a file has been modified then create a local copy of the file, containing the same directory structure and place on the new partition.
  • Create another service that would do the following:
    • Monitor Bandwidth Usages
    • Monitor file creations on the temporary partition.
    • Transfer several files at a time (Use Threading) to your FTP Server, abiding by the bandwidth usages at the current time, decreasing / increasing the worker threads depending on network traffic.
    • Remove the files from the partition that have successfully transferred.

So basically you have your drives:

  • C: Windows Installation
  • D: Share Storage
  • X: Temporary Partition

Then you would have following services:

  • LocalMirrorService - Watches D: and copies to X: with the dir structure
  • TransferClientService - Moves files from X: to ftp server, removes from X:
    • Also use multi threads to move multiples and monitors bandwidth.

I would bet that this is the idea that you had in mind but this seems like a reasonable approach as long as your really good with your application development and your able create a solid system that would handle most issues.

When a user edits a document in Microsoft Word for instance, the file will change on the share and it may be copied to X: even though the user is still working on it, within windows there would be an API see if the file handle is still opened by the user, if this is the case then you can just create a hook to watch when the user actually closes the document so that all there edits are complete, then you can migrate to drive X:.

this being said that if the user is working on the document and there PC crashes for some reason, the document / files handle may not get released until the document is opened at a later date, thus causing issues.

like image 145
RobertPitt Avatar answered Oct 15 '22 13:10

RobertPitt