Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

error (Reducer: ) when attempting to do distinct reduce

I am getting an error when trying to do a DISTINCT reduce that I got from here. I have reproduced this error on the beer-sample bucket, so this should be easy to reproduce. I have not seen any errors in the mapreduce_errors.txt file, or anything that would lead me anywhere in the others. (If you would like me to search or post snippets of other files, please ask).

Running couchbase enterprise 4 beta, on Windows 2008 R2 (This also happened on the 3.0.1 community edition as well.).

Here is my map function (Using the beer-sample bucket, that ships directly with couchbase).

function(doc, meta) {
  switch(doc.type) {
  case "brewery":
    emit(meta.id);
    break;
  }
}

Here is my reduce function:

function(keys, values, rereduce) {
  return keys.filter(function (e, i, arr) {
    return arr.lastIndexOf(e) === i;
  });
}

This is the error:

reason: error (Reducer: )

Also an imgur of the view page if it helps: http://i.imgur.com/KyLutMc.png

like image 654
ugh StackExchange Avatar asked Jul 01 '15 04:07

ugh StackExchange


People also ask

How many times does the reduce method in the reducer run?

A reducer is called only one time except if the speculative execution is activated. You can change mapreduce.

What does reducer do in MapReduce?

The Reducer copies the sorted output from each Mapper using HTTP across the network. The framework merge sorts Reducer inputs by key s (since different Mapper s may have output the same key). The shuffle and sort phases occur simultaneously i.e. while outputs are being fetched they are merged.

How do you increase the number of reducers for a Hive query?

Ways To Change Number Of ReducersUpdate the driver program and set the setNumReduceTasks to the desired value on the job object. job. setNumReduceTasks(5); There is also a better ways to change the number of reducers, which is by using the mapred.

What is reduce phase in MapReduce?

Reducer is a phase in hadoop which comes after Mapper phase. The output of the mapper is given as the input for Reducer which processes and produces a new set of output, which will be stored in the HDFS.


1 Answers

The problem lies within your custom reduce function: you're not handling the case when it's being called as part of a re-reduce.

As per Couchbase documentation:

The base format of the reduce() function is as follows:

function(key, values, rereduce) {
    ...
    return retval;
}

The reduce function is supplied three arguments:

key: The key is the unique key derived from the map() function and the group_level parameter.

values: The values argument is an array of all of the values that match a particular key. For example, if the same key is output three times, data will be an array of three items containing, with each item containing the value output by the emit() function.

rereduce: The rereduce indicates whether the function is being called as part of a re-reduce, that is, the reduce function being called again to further reduce the input data.

When rereduce is false:

  • The supplied key argument will be an array where the first argument is the key as emitted by the map function, and the id is the document ID that generated the key.

  • The values is an array of values where each element of the array matches the corresponding element within the array of keys.

When rereduce is true:

  • key will be null.

  • values will be an array of values as returned by a previous reduce() function. The function should return the reduced version of the information by calling the return() function. The format of the return value should match the format required for the specified key.

Bold formatting is mine, and the highlighted words are quite important: you should consider that sometimes, you'll receive the keys argument with a value of null.

According to the docs, you should handle the case when rereduce is true within your reduce() function, and you should know that in this case, keys will be null. In the case of your reduce() function, you could do something like this:

function(keys, values, rereduce) {
  if (rereduce) {
    var result = [];
    for (var i = 0; i < values.length; i++) {
      var distinct = values[i];
      for (var j = 0; j < distinct.length; j++) {
        result.push(distinct[j]);
      }
    }
    return result.filter(function (e, i, arr) {
      return arr.lastIndexOf(e) === i;
    });
  }

  return keys.filter(function (e, i, arr) {
    return arr.lastIndexOf(e) === i;
  });
}

Here, I'm firstly handling the re-reduce phase. For this I'm flattening the array of arrays that I'm receiving in the values argument and then I'm removing the duplicates that might have appeared after the merge.

Then it comes your original code, which returns the keys argument array without duplicates.

To test that this reduce() function actually works, I've used the following map() function:

function(doc, meta) {
  switch(doc.type) {
  case "brewery":
    emit(meta.id, null);
    emit(meta.id, null);
    break;
  }
}

This intentionally generates duplicates, which then are removed by the reduce() function.

like image 142
fps Avatar answered Sep 30 '22 16:09

fps