Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inter-process file exchange: efficiency and race conditions

The story:
A few days ago I was thinking about inter-process communication based on file exchange. Say process A creates several files during its work and process B reads these files afterwards. To ensure that all files were correctly written, it would be convenient to create a special file, which existence will signal that all operations were done.

Simple workflow:
process A creates file "file1.txt"
process A creates file "file2.txt"
process A creates file "processA.ready"

Process B is waiting until file "processA.ready" appears and then reads file1 and file2.

Doubts:
File operations are performed by the operating system, specifically by the file subsystem. Since implementations can differ in Unix, Windows or MacOS, I'm uncertain about the reliability of file exchange inter-process communication. Even if OS will guarantee this consistency, there are things like JIT compiler in Java, which can reorder program instructions.

Questions:
1. Are there any real specifications on file operations in operating systems?
2. Is JIT really allowed to reorder file operation program instructions for a single program thread?
3. Is file exchange still a relevant option for inter-process communication nowadays or it is unconditionally better to choose TCP/HTTP/etc?

like image 482
AdamSkywalker Avatar asked Oct 29 '15 10:10

AdamSkywalker


3 Answers

  1. You don’t need to know OS details in this case. Java IO API is documented to guess whether file was saved or not.
  2. JVM can’t reorder native calls. It is not written in JMM explicitly but it is implied that it can’t do it. JVM can’t guess what is impact of native call and reordering of those call can be quite generous.
  3. There are some disadvantages of using files as a way of communication:
    1. It uses IO which is slow
    2. It is difficult to separate processes between different machines in case you would need it (there are ways using samba for example but is quite platform-dependant)
like image 198
AndreyTsarevskiy Avatar answered Nov 07 '22 21:11

AndreyTsarevskiy


  1. You could use File watcher (WatchService) in Java to receive a signal when your .ready file appears.

  2. Reordering could apply but it shouldn't hurt your application logic in this case - refer the following link: https://assylias.wordpress.com/2013/02/01/java-memory-model-and-reordering/

  3. I don't know the size of your data but I feel it would still be better to use an Message Queue (MQ) solution in this case. Using a File IO is a relatively slow operation which could slow down the system.

like image 29
Akshay Gehi Avatar answered Nov 07 '22 20:11

Akshay Gehi


Used file exchange based approach on one of my projects. It's based on renaming file extensions when a process is done so other process can retrieve it by file name expression checking.

  1. FTP process downloads a file and put its name '.downloaded'
  2. Main task processor searched directory for the files '*.downloaded'.
    Before starting, job updates file name as '.processing'.
    When finished then updates to '.done'.
    In case of error, it creates a new supplemantary file with '.error' extension and put last processed line and exception trace there. On retries, if this file exists then read it and resume from correct position.
  3. Locator process searches for '.done' and according to its config move to backup folder or delete

This approach is working fine with a huge load in a mobile operator network.

Consideration point is to using unique names for files is important. Because moving file's behaviour changes according to operating system.
e.g. Windows gives error when there is same file at destination, however unix ovrwrites it.

like image 44
hsnkhrmn Avatar answered Nov 07 '22 20:11

hsnkhrmn