I'm wondering if it is possible to add a member object that can be used across multiple map() call. For example, a StringBuilder:
private StringBuilder builder;
public void map(...){
...
builder.setLength(0);
builder.append(a);
builder.append(b);
builder.append(c);
d = builder.toString();
...
}
Obviously, if the mapper object is shared across multiple threads, the builder object above will not behave as expected due to concurrent access from more than one threads.
So my question is: Is it assured that each thread in hadoop will use one dedicated mapper object for itself? Or it is a configurable behavior?
Thanks
As long as you are not using the MultithreadedMapper
class, but your own, there will be no problem. map()
is called sequential and not in parallel.
It is common to use a StringBuilder
or other data structures to buffer a few objects between the calls.
But make sure you clone the objects from your input objects, there is only one object and it will be filled over and over again to prevent lots of GC.
So there is no need to synchronize or take care of race conditions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With