Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does doRedis work?

I've been playing around with the R interface to the redis database, as well as the doRedis parallel backend for foreach. I have a couple of questions, to help me better apply this tool:

  1. doMC, doSMP, doSnow, etc. all seem to work by calling up worker processes on the same computer, passing them an element from a list and a function to apply, and then gathering the results. In the case of doMC, the workers share memory. However, I'm a little bit confused as to how a database can provide this same functionality.
  2. When I add an additional slave computer to the doRedis job queue (as in this video), is the entire doredis database being sent to the slave computer? Or is each slave just the data it needs at a particular moment (i.e. one element of a list and a function to apply).
  3. How do I explicitly pass additional data and functions to the doRedis job queue, that each slave will need to perform it's computations?
  4. When using doRedis and foreach, are there any additional 'gotchas' that might not apply to other parallel backends?

I know this is a lot of questions, but I've been running into situations where my limited understanding of how parallel processing works has been hindering my abilities to implement it. For example, I recently tried to parallelize a computation on a large database, and caught myself passing the entire database to each node on my cluster, an operation which completely destroyed any advantage I'd gained from parallelizing.

Thank you!

like image 381
Zach Avatar asked Apr 23 '11 16:04

Zach


1 Answers

One piece of the puzzle is rredis

1 - doRedis uses rredis. Specifically, doRedis.R uses redis:RPush (as it iterates over the foreach items) and each redisWorker uses redis:BRPop to grab something from the redis list (which you named in your doRedis "job").

Redis is not just a database. Here it is being used as a queue!

2 - You have 1 instance (remotely) accessible to all your R workers. Think of the Redis server as a distributed queue. Your job master pushes items to a list, and workers grab and item and process it and push it to the result list. You can have m workers for N items. Depends on what you want to do.

3 - Use the env param. That uses the Redis:Set which all workers have access to (via Redis:Get). You pass a delimited expression on the foreach side and that is set in a string key in redis to which the workers have access.

4 - None that I know (but that is hardly authoritative so do ask around.) I also suggest you read the provided source code. The answers above are straight from reading doRedis.R and redisWorker.R.

Hope this helps.

[p.s. telnet to your redis and issue the Redis:monitor command to monitor the chatter back and forth.]

like image 158
alphazero Avatar answered Oct 01 '22 23:10

alphazero