I have seen plenty of questions regarding writing to file, but I am wondering what is the most robust way to open a text file, append some data and then close it again when you are going to be writing from many connections (i.e. in a parallel computing situation), and can't guarantee when each connection will want to write to the file.
For instance in the following toy example, which uses just the cores on my desktop, it seems to work ok, but I am wondering if this method will be prone to failure if the writes get longer and the number of processes writing to the file increases (especially across a network share where there may be some latency).
Can anyone suggest a robust, definitive way that connections should be opened, written to and then closed when there may be other slave processes that want to write to the file at the same time?
require(doParallel)
require(doRNG)
ncores <- 7
cl <- makeCluster( ncores , outfile = "" )
registerDoParallel( cl )
res <- foreach( j = 1:100 , .verbose = TRUE , .inorder= FALSE ) %dorng%{
d <- matrix( rnorm( 1e3 , j ) , nrow = 1 )
conn <- file( "~/output.txt" , open = "a" )
write.table( d , conn , append = TRUE , col.names = FALSE )
close( conn )
}
I am looking for the best way to do this, or if there is even a best way. Perhaps R and foreach
take care of what I would call writelock issues automagically?
Thanks.
To append to a text fileUse the WriteAllText method, specifying the target file and string to be appended and setting the append parameter to True . This example writes the string "This is a test string." to the file named Testfile. txt .
Two processes successfully appending to the same file will result in all their bytes in the file in order, but not necessarily contiguously. The caveat is that not all filesystems are POSIX-compatible. Two famous examples are NFS and the Hadoop Distributed File System (HDFS).
The "a" mode allows you to open a file to append some content to it. And we want to add a new line to it, we can open it using the "a" mode (append) and then, call the write() method, passing the content that we want to append as argument.
In Java, we can append a string in an existing file using FileWriter which has an option to open a file in append mode. Java FileWriter class is used to write character-oriented data to a file. It is a character-oriented class that is used for file handling in Java.
The foreach package doesn't provide a mechanism for file locking that would prevent multiple workers from writing to the same file at the same time. The result of doing that is going to depend on your operating system and file system. I'd be particularly worried about the results when using a distributed file system such as NFS.
Instead, I would change the way you open the output file to include the process ID of the worker:
conn <- file( sprintf("~/output_%d.txt" , Sys.getpid()) , open = "a" )
You could concatenate the files after the foreach loop returns if desired.
Of course, if you were using multiple machines, you might have two workers with the same process ID, so you could include the hostname in the file name as well, using Sys.info()[['nodename']]
, for example.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With