Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

hadoop - map reduce task and static variable

I just started working on some hadoop/hbase MapReduce job (using cloudera) and I have the following question :

Let's say, we have a java class with a main and a static viariable. That class define inner class corresponding to the Mapper and Reducer tasks. Before lauching the job, the main initialize the static variable. This variable is read in the Mapper class. The class is then launched using 'hadoop jar' on a cluster.

My question: I don't see how Map and Reduce tasks on other nodes can see that static variable. Is there any "hadoop magic" that allow nodes to share a jvm or static variables ? How can this even work ? I have to work on a class doing just that, and I can't figure out how this is ok in a non-mononode cluster. Thank you

like image 602
jlb Avatar asked Jun 18 '14 08:06

jlb


1 Answers

In a distributed Hadoop cluster each Map/Reduce task runs in it's own separate JVM. So there's no way to share static variable between different class instances running on different JVMs (and even on different nodes).

But if you want to share some immutable data between tasks, you can use Configuration class:

// driver code
Configuration config = Configuration.create();
config.setLong("foo.bar.somelong",1337);
...

// mapper code
public class SomeMapper ... {
    private long someLong = 0;
    public void setup(Context context) {
        Configuration config = context.getConfiguration();
        someLong = config.getLong("foo.bar.somelong");
    }
}
like image 143
shutty Avatar answered Oct 06 '22 00:10

shutty