Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When using HBase as a source for MapReduce, can I extend TableInputFormatBase to create multiple splits and multiple mappers for each region?

I'm thinking about using HBase as a source for one of my MapReduce jobs. I know that TableInputFormat specifies one input split (and thus one mapper) per Region. However, this seems inefficient. I'd really like to have multiple mappers working on a given Region at once. Can I achieve this by extending TableInputFormatBase? Can you please point me to an example? Furthermore, is this even a good idea?

Thanks for the help.

like image 270
sangfroid Avatar asked Jun 14 '12 18:06

sangfroid


1 Answers

You need a custom input format that extends InputFormat. you can get idea how do this from answer to question I want to scan lots of data (Range based queries), what all optimizations I can do while writing the data so that scan becomes faster. This is a good idea if the time of data processing is more greater then data retrieving time.

like image 54
Alexander Kuznetsov Avatar answered Oct 26 '22 22:10

Alexander Kuznetsov