I'm using the mclapply
function in the multicore
package to do parallel processing. It seems that all child processes started produce the same names for temporary files given by the tempfile
function. i.e. if I have four processors,
library(multicore)
mclapply(1:4, function(x) tempfile())
will give four exactly same filenames. Obviously I need the temporary files to be different so that the child processes don't overwrite each others' files. When using tempfile
indirectly, i.e. calling some function that calls tempfile
I have no control over the filename.
Is there a way around this? Do other parallel processing packages for R (e.g. foreach
) have the same problem?
Update: This is no longer an issue since R 2.14.1.
CHANGES IN R VERSION 2.14.0 patched:
[...]
o tempfile() on a Unix-alike now takes the process ID into account.
This is needed with multicore (and as part of parallel) because
the parent and all the children share a session temporary
directory, and they can share the C random number stream used to
produce the uniaue part. Further, two children can call
tempfile() simultaneously.
A temporary file is a file that is created to temporarily store information in order to free memory for other purposes, or to act as a safety net to prevent data loss when a program performs certain functions. For example, Word determines automatically where and when it needs to create temporary files.
For example, Microsoft Windows and Windows programs often create a file with a . tmp file extension as a temporary file. Programs like Microsoft Word may create a temporary hidden file beginning with a tilde and a dollar sign (e.g., ~$example. doc) in the same directory as the document.
What are temporary files? Temporary files are used by your system to store data while running programs or creating permanent files, such as Word documents or Excel spreadsheets. In the event that information is lost, your system can use temporary files to recover data.
Temporary files, also called temp or tmp files, are created by Windows or programs on your computer to hold data while a permanent file is being written or updated. The data will be transferred to a permanent file when the task is complete, or when the program is closed.
I believe multicore
spins off a separate process for each subtask. If that assumption is correct, then you should be able to use Sys.getpid()
to "seed" tempfile:
tempfile(pattern=paste("foo", Sys.getpid(), sep=""))
Use the x
in your function:
mclapply(1:4, function(x) tempfile(pattern=paste("file",x,"-",sep=""))
Because the parallel jobs all run at the same time, and because the random seed comes from the system time, running four instances of tempfile in parallel will typically produce the same results (if you have 4 cores, that is. If you only have two cores, you'll get two pairs of identical temp file names).
Better to generate the tempfile names first and give them to your function as an argument:
filenames <- tempfile( rep("file",4) )
mclapply( filenames, function(x){})
If you're using someone else's function that has a tempfile call in it, then working the PID into the tempfile name by modifying the tempfile function, as previously suggested, is probably the simplest plan:
tempfile <- function( pattern = "file", tmpdir = tempdir(), fileext = ""){
.Internal(tempfile(paste("pid", Sys.getpid(), pattern, sep=""), tmpdir, fileext))}
mclapply( 1:4, function(x) tempfile() )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With