Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala/Hadoop: Specifying Context for Reducer

Before I jump into playing with Scoobi or Scrunch, I thought I'd try to port WordCount to scala (2.9.1) using just Hadoop (0.20.1)'s java bindings.

Originally, I had:

class Map extends Mapper[LongWritable, Text, Text, IntWritable] {
  @throws[classOf[IOException]]
  @throws[classOf[InterruptedException]]
  def map(key : LongWritable, value : Text, context : Context) {
    //...

Which compiled fine, but gave me a runtime error:

java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, recieved org.apache.hadoop.io.LongWritable

After looking around a bit, I figured out that it was because I wasn't defining the proper map method (should have been cued off by the lack of override), so I fixed it to be:

override def map(key : LongWritable, value : Text, 
  context : Mapper[LongWritable, Text, Text, IntWritable]#Context) {

And voila, no runtime error.

But then I looked at the job output, and realized that my reducer wasn't getting run.

So I looked at my reducer, and noticed the reduce signature had the same problem as my mapper:

class Reduce extends Reducer[Text, IntWritable, Text, IntWritable] {
  @throws[classOf[IOException]]
  @throws[classOf[InterruptedException]]
  def reduce(key : Text, value : Iterable[IntWritable], context : Context) {
    //...

So I guessed the identity reduce was being used because of the mismatch.

But when I attempted to correct the signature of reduce:

override def reduce(key: Text, values : Iterable[IntWritable], 
  context : Reducer[Text, IntWritable, Text, IntWritable]#Context) {

I now got a compiler error:

[ERROR] /path/to/src/main/scala/WordCount.scala:32: error: method reduce overrides nothing
[INFO]     override def reduce(key: Text, values : Iterable[IntWritable], 

So I'm not sure what I'm doing wrong.

like image 679
rampion Avatar asked Mar 25 '12 01:03

rampion


Video Answer


1 Answers

At first glance, make sure that values is java.lang.Iterable, not scala Iterable. Either import java.lang.Iterable, or:

override def reduce(key: Text, values : java.lang.Iterable[IntWritable], context : Reducer[Text, IntWritable, Text, IntWritable]#Context)
like image 196
jwinder Avatar answered Sep 19 '22 20:09

jwinder