New to hadoop and trying to understand the mapreduce wordcount example code from here. The mapper from documentation is - <pre class="prettyprint"><code>Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT> </code></pre> I see that in the mapreduce word count example the map code is as follows <pre class="prettyprint"><code>public void map(Object key, Text value, Context context) </code></pre> Question - What is the point of this key of type Object? If the input to a mapper is a text document I am assuming the value in would be the chunk of text (64MB or 128MB) that hadoop has partitioned and stored in HDFS. More generally, what is the use of this input key Keyin to the map code? Any pointers would be greatly appreciated

InputFormat describes the input-specification for a Map-Reduce job.By default, hadoop uses <code>TextInputFormat</code>, which inherits <code>FileInputFormat</code>, to process the input files. We can also specify the input format to use in the client or driver code: <pre class="prettyprint"><code>job.setInputFormatClass(SomeInputFormat.class); </code></pre> For the <code>TextInputFormat</code>, files are broken into lines. Keys are the position in the file, and values are the line of text. In the <code>public void map(Object key, Text value, Context context)</code> , key is the line offset and value is the actual text. Please look at TextInputFormat API https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.html By default, Key is <code>LongWritable</code> type and value is of type <code>Text</code> for the <code>TextInputFormat</code>.In your example, Object type is specified in the place of <code>LongWritable</code> as it is compatible. You can also use <code>LongWritable</code> type in the place of <code>Object</code>

Key of object type in the hadoop mapper

Tags:

java

hadoop

mapreduce

New to hadoop and trying to understand the mapreduce wordcount example code from here.

The mapper from documentation is -

Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

I see that in the mapreduce word count example the map code is as follows

public void map(Object key, Text value, Context context)

Question - What is the point of this key of type Object? If the input to a mapper is a text document I am assuming the value in would be the chunk of text (64MB or 128MB) that hadoop has partitioned and stored in HDFS. More generally, what is the use of this input key Keyin to the map code?

Any pointers would be greatly appreciated

202

asked Mar 15 '15 17:03

user275157

1 Answers

InputFormat describes the input-specification for a Map-Reduce job.By default, hadoop uses TextInputFormat, which inherits FileInputFormat, to process the input files.

We can also specify the input format to use in the client or driver code:

job.setInputFormatClass(SomeInputFormat.class);

For the TextInputFormat, files are broken into lines. Keys are the position in the file, and values are the line of text.

In the public void map(Object key, Text value, Context context) , key is the line offset and value is the actual text.

Please look at TextInputFormat API https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.html

By default, Key is LongWritable type and value is of type Text for the TextInputFormat.In your example, Object type is specified in the place of LongWritable as it is compatible. You can also use LongWritable type in the place of Object

157

answered Oct 09 '22 12:10

Ramana

Related questions
                            
                                Setup JAVA_HOME vs JRE_HOME vs PATH environment variables
                            
                                Creating namespace prefixed XML nodes in Java DOM
                            
                                Floating point precision in literals vs calculations
                            
                                How to use printf() method with long values in Java?
                            
                                assert Vs if(var != null) which way is better or is correct?
                            
                                Running akka with runnable jar
                            
                                How to print logs in color using log4j2 highlight pattern?
                            
                                How could collateral effects be managed in Java8 streams
                            
                                Java program hangs on executeUpdate without exception
                            
                                Migrating from jersey to spring-mvc/rest: ContainerRequestFilter, ContainerResponseFilter
                            
                                File not found in same folder Java
                            
                                How does a JVM running multiple threads handle ctrl-c, w/ and w/o shutdown hooks?
                            
                                TokenServicesUserApprovalHandler not found in latest spring-security-oauth2-2.0.6.RELEASE.jar
                            
                                cmake for OpenCV doesn't detect jdk
                            
                                Spring4 JUnit tests : Load SQL to a H2 db
                            
                                Is it possible to create a custom Jackson objectMapper for Spring without resorting to XML?
                            
                                Java SWT : Badge Notifications
                            
                                Best design pattern/approach for a long list of if/else/execute branches of code
                            
                                Does JDK 8 supports JasperReports 6?
                            
                                How to switch between portrait and landscape mode with libGDX?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With