Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the most robust way to append text to a single file from multiple connections

I have seen plenty of questions regarding writing to file, but I am wondering what is the most robust way to open a text file, append some data and then close it again when you are going to be writing from many connections (i.e. in a parallel computing situation), and can't guarantee when each connection will want to write to the file.

For instance in the following toy example, which uses just the cores on my desktop, it seems to work ok, but I am wondering if this method will be prone to failure if the writes get longer and the number of processes writing to the file increases (especially across a network share where there may be some latency).

Can anyone suggest a robust, definitive way that connections should be opened, written to and then closed when there may be other slave processes that want to write to the file at the same time?

require(doParallel)
require(doRNG)

ncores <- 7
cl <- makeCluster( ncores , outfile = "" )
registerDoParallel( cl )

res <- foreach( j = 1:100 , .verbose = TRUE , .inorder= FALSE ) %dorng%{
    d <- matrix( rnorm( 1e3 , j ) , nrow = 1 )
    conn <- file( "~/output.txt" , open = "a" )
    write.table( d , conn , append = TRUE , col.names = FALSE )
    close( conn )
}

I am looking for the best way to do this, or if there is even a best way. Perhaps R and foreach take care of what I would call writelock issues automagically?

Thanks.

like image 625
Simon O'Hanlon Avatar asked Mar 08 '13 13:03

Simon O'Hanlon


People also ask

How do you append text to a file?

To append to a text fileUse the WriteAllText method, specifying the target file and string to be appended and setting the append parameter to True . This example writes the string "This is a test string." to the file named Testfile. txt .

Can multiple processes append to the same file?

Two processes successfully appending to the same file will result in all their bytes in the file in order, but not necessarily contiguously. The caveat is that not all filesystems are POSIX-compatible. Two famous examples are NFS and the Hadoop Distributed File System (HDFS).

Which method is used to appending to a file in python?

The "a" mode allows you to open a file to append some content to it. And we want to add a new line to it, we can open it using the "a" mode (append) and then, call the write() method, passing the content that we want to append as argument.

How do you append a text file in Java?

In Java, we can append a string in an existing file using FileWriter which has an option to open a file in append mode. Java FileWriter class is used to write character-oriented data to a file. It is a character-oriented class that is used for file handling in Java.


1 Answers

The foreach package doesn't provide a mechanism for file locking that would prevent multiple workers from writing to the same file at the same time. The result of doing that is going to depend on your operating system and file system. I'd be particularly worried about the results when using a distributed file system such as NFS.

Instead, I would change the way you open the output file to include the process ID of the worker:

conn <- file( sprintf("~/output_%d.txt" , Sys.getpid()) , open = "a" )

You could concatenate the files after the foreach loop returns if desired.

Of course, if you were using multiple machines, you might have two workers with the same process ID, so you could include the hostname in the file name as well, using Sys.info()[['nodename']], for example.

like image 93
Steve Weston Avatar answered Sep 27 '22 23:09

Steve Weston