How to pull data in the Map/Reduce functions?

Tags:

According to the Hadoop : The Definitive Guide.

The new API supports both a “push” and a “pull” style of iteration. In both APIs, key-value record pairs are pushed to the mapper, but in addition, the new API allows a mapper to pull records from within the map() method. The same goes for the reducer. An example of how the “pull” style can be useful is processing records in batches, rather than one by one.

Has anyone pulled data in the Map/Reduce functions? I am interested in the API or example for the same.

565

asked Sep 24 '11 08:09

Praveen Sripati

1 Answers

I posted a query @ [email protected] and got the answer.

The next key value pair can be retrieved from the context object which is passed to the map, by calling nextKeyValue() on it. So you will be able to pull the next data from it in the new API.

Is the performance of pull better than push in this scenario? Also, what are the scenarios in which the pull will be useful?

169

answered Sep 22 '22 10:09

Praveen Sripati

Related questions
                            
                                What is significance of the Oozie MR launcher?
                            
                                Apache Nutch: Get outlink URL's text context
                            
                                Hadoop YARN how to determine the number of containers
                            
                                Cassandra + Solr/Hadoop/Spark - Choosing the right tools
                            
                                Apache flume twitter agent not streaming data
                            
                                Hadoop command line -D options not working
                            
                                Namenode HA (UnknownHostException: nameservice1)
                            
                                Hadoop Error - All data nodes are aborting
                            
                                hadoop warn EBADF: Bad file descriptor
                            
                                Pydoop stucks on readline from HDFS files
                            
                                Spark Task not serializable (Case Classes)
                            
                                Why is Dockerized Hadoop datanode registering with the wrong IP address?
                            
                                Hadoop Yarn: How to limit dynamic self allocation of resources with Spark?
                            
                                spark: SAXParseException while writing to parquet on s3
                            
                                How does back pressure property work in Spark Streaming?
                            
                                YARN: Containers and JVM
                            
                                Spark Shell with Yarn - Error: Yarn application has already ended! It might have been killed or unable to launch application master
                            
                                Spring-Batch for a massive nightly / hourly Hive / MySQL data processing
                            
                                Problem starting tasktracker in hadoop under windows
                            
                                Running Hadoop MapReduce, is it possible to call external executables outside of HDFS

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to pull data in the Map/Reduce functions?

Tags:

pull

hadoop

mapreduce

Praveen Sripati

People also ask

1 Answers

Praveen Sripati

Recent Activity

Donate For Us