How would I go about blocking the access to a file until a specific function that both involves read and write processes to that very file has returned?
I often want to create some sort of central registry and there might be more than one R process involved in reading from and writing to that registry (in kind of a "poor man's parallelization" setting where different processes run independently from each other except with respect to the registry access).
I would not like to depend on any DBMS such as SQLite, PostgreSQL, MongoDB etc. early on in the devel process. And even though I later might use a DBMS, a filesystem-based solution might still be a handy fallback option. Thus I'm curious how I could realize it with base R functionality (at best).
I'm aware that having a lot of reads and writes to the file system in a parallel setting is not very efficient compared to DBMS solutions.
I'm running on MS Windows 8.1 (64 Bit)
What actually exactly happens when two or more R processes try to write to or read from a file at the same time? Does the OS figure out the "accesss order" automatically and does the process that "came in second" wait or does it trigger an error as the file access might is blocked by the first process? How could I prevent the second process from returning with an error but instead "just wait" until it's his turn?
Besides the rredis Package: are there any other options for shared memory on MS Windows?
Path to registry file:
path_registry <- file.path(tempdir(), "registry.rdata")
Example function that registers events:
registerEvent <- function(
id=gsub("-| |:", "", Sys.time()),
values,
path_registry
) {
if (!file.exists(path_registry)) {
registry <- new.env()
save(registry, file=path_registry)
} else {
load(path_registry)
}
message("Simulated additional runtime between reading and writing (5 seconds)")
Sys.sleep(5)
if (!exists(id, envir=registry, inherits=FALSE)) {
assign(id, values, registry)
save(registry, file=path_registry)
message(sprintf("Registering with ID %s", id))
out <- TRUE
} else {
message(sprintf("ID %s already registered", id))
out <- FALSE
}
out
}
Example content that is registered:
x <- new.env()
x$a <- TRUE
x$b <- letters[1:5]
Note that the content usually is "nested", i.e. RDBMS would not be really "useful" anyway or at least would involve some normalization steps before writing to the DB. That's why I prefer environment
s (unique variable IDs and pass-by-reference is possible) over list
s and, if one does make the step to use a true DBMS, I would rather turn NoSQL approaches such as MongoDB.
Registration cycle:
The actual calls might be spread over different processes, so there is a possibility of concurrent access atempts.
I want to have other processes/calls "wait" until a registerEvent
read-write cycle is finished before doing their read-write cycle (without triggering errors).
registerEvent(values=list(x_1=x, x_2=x), path_registry=path_registry)
registerEvent(values=list(x_1=x, x_2=x), path_registry=path_registry)
registerEvent(id="abcd", values=list(x_1=x, x_2=x),
path_registry=path_registry)
registerEvent(id="abcd", values=list(x_1=x, x_2=x),
path_registry=path_registry)
Check registry content:
load(path_registry)
ls(registry)
See filelock R package, available since 2018. It is cross-platform. I am using it on Windows and have not found a single problem.
Make sure to read the documentation.
?filelock::lock
Although the docs suggest to leave the lock file, I have had no problems removing it on function exit in a multi-process environment:
on.exit({filelock::unlock(lock); file.remove(path.lock)})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With