Increase number of Hive mappers in Hadoop 2

Tags:

I created a HBase table from Hive and I'm trying to do a simple aggregation on it. This is my Hive query:

from my_hbase_table 
select col1, count(1) 
group by col1;

The map reduce job spawns only 2 mappers and I'd like to increase that. With a plain map reduce job I would configure the yarn and mapper memory to increase the number of mappers. I tried the following in Hive but it did not work:

set yarn.nodemanager.resource.cpu-vcores=16;
set yarn.nodemanager.resource.memory-mb=32768;
set mapreduce.map.cpu.vcores=1;
set mapreduce.map.memory.mb=2048;

NOTE:

My test cluster has only 2 nodes
The HBase table has more than 5M records
Hive logs show HiveInputFormat and a number of splits=2

885

asked May 13 '15 17:05

Marsellus Wallace

2 Answers

Reduce the input split size from the default value. The mappers will get increased.

SET mapreduce.input.fileinputformat.split.maxsize;

196

answered Oct 03 '22 10:10

Partha Kaushik

Split the file lesser then default value is not a efficient solution. Spiting is basically used during dealing with large dataset. Default value is itself a small size so its not worth to split it again.

I would recommend following configuration before your query.You can apply it based upon your input data.

set hive.merge.mapfiles=false;

set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

set mapred.map.tasks = XX;

If you want to assign number of reducer also then you can use below configuration

set mapred.reduce.tasks = XX;

Note that on Hadoop 2 (YARN), the mapred.map.tasks and mapred.reduce.tasks are deprecated and are replaced by other variables:

mapred.map.tasks     -->    mapreduce.job.maps
mapred.reduce.tasks  -->    mapreduce.job.reduces

Please refer below useful link related to this

http://answers.mapr.com/questions/5336/limit-mappers-and-reducers-for-specific-job.html

Fail to Increase Hive Mapper Tasks?

How mappers get assigned

Number of mappers is determined by the number of splits determined by the InputFormat used in the MapReduce job. In a typical InputFormat, it is directly proportional to the number of files and file sizes.

suppose your HDFS block configuration is configured for 64MB(default size) and you have a files with 100MB size then it will occupy 2 block and then 2 mapper will get assigned based on the blocks

but suppose if you have 2 files with 30MB size(each file) then each file will occupy one block and mapper will get assigend based on that.

When you are working with a large number of small files, Hive uses CombineHiveInputFormat by default. In terms of MapReduce, it ultimately translates to using CombineFileInputFormat that creates virtual splits over multiple files, grouped by common node, rack when possible. The size of the combined split is determined by

mapred.max.split.size
or 
mapreduce.input.fileinputformat.split.maxsize ( in yarn/MR2);

So if you want to have less splits(less mapper) you need to set this parameter higher.

This link can be useful to understand more on it.

What is the default size that each Hadoop mapper will read?

Also number of mappers and reducers are always dependent of available mapper and reducer slots of your cluster.

answered Oct 03 '22 09:10

Sandeep Singh

Related questions
                            
                                Use string methods to find and count vowels in a string?
                            
                                What is the difference between actors (Akka) and agents (JADE) in distributed systems? [closed]
                            
                                Java API to query LDAP
                            
                                Selenium Webdriver w/Java: locating elements with multiple class names with one command
                            
                                What´s the difference between AtomicReference<Integer> vs. AtomicInteger?
                            
                                Dagger custom scopes, how to?
                            
                                Export my data on CSV file from app android
                            
                                How to check if user input is not an int value
                            
                                Java Arraylist got java.lang.IndexOutOfBoundsException?
                            
                                XStream XmlPullParserException
                            
                                Fetch only first N lines of a Stack Trace
                            
                                DynamoDB Global Secondary Index with Exclusive Start Key
                            
                                Can a RESTful service return both JSON and XML for the same resource, depending on the request header?
                            
                                No bean named 'springSecurityFilterChain' is defined error with javaconfig
                            
                                How to replace all numbers in java string
                            
                                How to implement PhantomJS with Selenium WebDriver using java
                            
                                Does Hashmap autosort?
                            
                                How to make the support Toolbar background transparent?
                            
                                ActiveAndroid SQLite exception 'No such table'
                            
                                Working on Creating Image Gallery in JavaFX. not able to display image properly

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Increase number of Hive mappers in Hadoop 2

Tags:

java

hadoop

hadoop2

hive

hbase

Marsellus Wallace

People also ask

2 Answers

Partha Kaushik

Sandeep Singh

Recent Activity

Donate For Us