I've been playing around with the R interface to the redis database, as well as the doRedis parallel backend for foreach. I have a couple of questions, to help me better apply this tool:
I know this is a lot of questions, but I've been running into situations where my limited understanding of how parallel processing works has been hindering my abilities to implement it. For example, I recently tried to parallelize a computation on a large database, and caught myself passing the entire database to each node on my cluster, an operation which completely destroyed any advantage I'd gained from parallelizing.
Thank you!
One piece of the puzzle is rredis
1 - doRedis uses rredis. Specifically, doRedis.R uses redis:RPush (as it iterates over the foreach items) and each redisWorker uses redis:BRPop to grab something from the redis list (which you named in your doRedis "job").
Redis is not just a database. Here it is being used as a queue!
2 - You have 1 instance (remotely) accessible to all your R workers. Think of the Redis server as a distributed queue. Your job master pushes items to a list, and workers grab and item and process it and push it to the result list. You can have m workers for N items. Depends on what you want to do.
3 - Use the env param. That uses the Redis:Set which all workers have access to (via Redis:Get). You pass a delimited expression on the foreach side and that is set in a string key in redis to which the workers have access.
4 - None that I know (but that is hardly authoritative so do ask around.) I also suggest you read the provided source code. The answers above are straight from reading doRedis.R
and redisWorker.R
.
Hope this helps.
[p.s. telnet to your redis and issue the Redis:monitor command to monitor the chatter back and forth.]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With