Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Mapper Object of Hadoop Shared across Multiple Threads?

I'm wondering if it is possible to add a member object that can be used across multiple map() call. For example, a StringBuilder:

private StringBuilder builder;

public void map(...){
    ...

    builder.setLength(0);
    builder.append(a);
    builder.append(b);
    builder.append(c);
    d = builder.toString();

    ...
}

Obviously, if the mapper object is shared across multiple threads, the builder object above will not behave as expected due to concurrent access from more than one threads.

So my question is: Is it assured that each thread in hadoop will use one dedicated mapper object for itself? Or it is a configurable behavior?

Thanks

like image 608
JRaSH Avatar asked Nov 04 '22 01:11

JRaSH


1 Answers

As long as you are not using the MultithreadedMapper class, but your own, there will be no problem. map() is called sequential and not in parallel.

It is common to use a StringBuilder or other data structures to buffer a few objects between the calls. But make sure you clone the objects from your input objects, there is only one object and it will be filled over and over again to prevent lots of GC.

So there is no need to synchronize or take care of race conditions.

like image 183
Thomas Jungblut Avatar answered Nov 09 '22 13:11

Thomas Jungblut