Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Riak fails at MapReduce queries. Which configuration to use?

I am working on a nodejs application in combination with riak / riak-js and run into the following problem:

Running this request

db.mapreduce
  .add('logs')
  .run();

corretly returns all 155.000 items stored in the bucket logs with their IDs:

[ 'logs', '1GXtBX2LvXpcPeeR89IuipRUFmB' ],
[ 'logs', '63vL86NZ96JptsHifW8JDgRjiCv' ],
[ 'logs', 'NfseTamulBjwVOenbeWoMSNRZnr' ],
[ 'logs', 'VzNouzHc7B7bSzvNeI1xoQ5ih8J' ],
[ 'logs', 'UBM1IDcbZkMW4iRWdvo4W7zp6dc' ],
[ 'logs', 'FtNhPxaay4XI9qfh4Cf9LFO1Oai' ],
....

If I specify a map-Funktion and use only a few of the items in the bucket logs

db.mapreduce
  .add([['logs', 'SUgJ2fhfgyR2WE87n7IVHyBi4C9'], ['logs', 'EMtywD1UFnsq9rNRuINLzDsHdh2'], ['logs', 'ZXPh5ws8mOdASQFEtLDk8CBRn8t']])
  .map( function(v) {return ["asd"]; } )
  .run();

everything is working fine and the following, expected output is returned:

[ 'asd', 'asd', 'asd' ]

If I now want riak to map all items (about 155.000 small json docs) in the bucket "logs"

db.mapreduce    
  .add('logs')  
  .map( function(v) {return ["asd"]; } )    
  .run();

I only receive errors:

{ [Error: [object Object]] message: '[object Object]', statusCode: 500 }

What does happen here? In the Error-Object nothing useful is written.

Update: The riak-console says the following multiple times:

[notice] JS call failed: All VMs are busy.

After incrementing map_js_vm_count in riaks app.config to 36, the message turns into:

[error] Pipe worker startup failed:fitting was gone before startup

Links: Basho Labs Riak Driver riak-js

like image 800
Cornelius Schmale Avatar asked Nov 12 '12 14:11

Cornelius Schmale


1 Answers

Bryan from basho.com answered my Question:

Hi, Cornelius. Could you describe a bit, your Riak configuration? Specifically, how many nodes are in your cluster, and what is the ring_creation_size from you app.config?

If, for example, you're using a default setup {ring_creation_size, 64} on a one-node development cluster, this behavior is quite likely. 155,000 items is enough to get all 64 vnodes working.

In the first case, before raising map_js_vm_count, those 64 vnodes are fighting over just 8 Javascript VMs, and so some are likely to be starved long enough to time out, which will cause the "All VMs are busy" log message.

In the second case, after raising map_js_vm_count, it's likely that those 36 Javascript VMs just aren't able to process all 155,000 items before the query timeout arrives. The "fitting was gone before startup" log message is saying that pipe running the query shut down while there were still inputs arriving at vnodes.

You're not seeing either of these behaviors in the simple case with no map function because no interaction with Javascript VMs is required. In addition, for that case, objects are not even read off of disk, further alleviating resource contention.

The two configuration solutions I expect will help the most are lowering ring_creation_size, and raising the query timeout. Lowering ring_creation_size to 16, or even 8 on a single-node cluster will cause less contention for Javascript VMs because there will be less attempted parallelism in the map function processing. Raising the query timeout (should be an argument to the 'run' function, or similar, but I'm not familiar with the riak-js client), will give the query more time to finish before shutting down, which may be necessary to overcome slow processing.

Rewriting your map function in Erlang should also help, since it will be faster, and will not have the same sort of VM contention. But, I understand, that's not as easy to using in early-stage development.

HTH, Bryan

like image 150
Cornelius Schmale Avatar answered Oct 18 '22 11:10

Cornelius Schmale