Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can my Linux daemon know when a Windows program has stopped writing a file that I access through SAMBA?

I'm developing a system that interfaces with a USPS shipping package called Dazzle. Part of this system includes a monitoring daemon whose purpose is to take tab-separated value files, turn them into XML that Dazzle recognizes, and pass them to Dazzle for label generation. And this part works just fine. What I also want to do, however, is to parse the output file that Dazzle generates and import it into a database.

Note here that Dazzle runs on Windows. My monitoring daemon is written in Perl and runs on Linux. My Linux system has Dazzle's input and output directories mounted via Samba.

There is a measurable delay between the time Dazzle starts writing the output file and the time it's finished. What I want to know is how I can wait for Dazzle to finish writing the output file? I've tried opening the file and doing flock($fh, LOCK_SH) on it, but that didn't seem to do any good.

EDIT: I have an idea based on "mobrule"'s comment below. Dazzle writes an output file in XML. Each package in the shipment is enclosed in tags, and the entire document is enclosed in a tag. So, if I start reading the file before it's complete, I can simply wait for the appropriate closing tag before I take action.

Also, I should mention what I'm doing currently. When I detect that the output XML file has been created, I attempt to parse it. If that parsing fails, I sleep and try again. If that fails, I sleep twice as long, then try again, and so on. This has worked pretty well in testing with a 64 second timeout.

like image 868
Kit Peters Avatar asked Feb 25 '10 15:02

Kit Peters


3 Answers

There is no general and portable way to tell if some process has an open filehandle to some arbitrary file. You must make a judgement with your local knowledge of the situation.

In this case, it may be possible to query the process table on the Windows machine to see if the "Dazzle" program is still running. Or maybe your experience gives you other guidelines, like "Dazzle never takes more than 20 seconds to run when the input is reasonable" or "when Dazzle is running, it updates a file every couple of seconds. If the file hasn't been updated in, say, 10 seconds, then there's a very good chance that Dazzle is finished."

But you don't necessarily have to wait until Dazzle is finished. It is perfectly OK to read the file at the same time Dazzle is writing to it -- see the perldoc for the seek function, paying attention to the part about "how to emulate tail -f". Then you can update your database while Dazzle is running.

This way, if you are too conservative about guessing when Dazzle has finished, your database will still be updated in a timely manner, and the only cost will be some useless seek and read calls on a filehandle at EOF.

like image 137
mob Avatar answered Nov 19 '22 18:11

mob


This is probably not a great solution, but you could try to rename the file repeatedly, sleep for a bit if it fails.

like image 44
jlew Avatar answered Nov 19 '22 18:11

jlew


You could try doing a lock w/ LOCK_EX - and if the lock fails, that means it's still being written. Spin like that until you obtain the lock, and dazzle should be done. This would fail if Dazzle ever closes the file and opens it again w/ append mode, so it's not the best solution.

like image 1
NG. Avatar answered Nov 19 '22 19:11

NG.