Running Hadoop MapReduce, is it possible to call external executables outside of HDFS

Tags:

Within my mapper I'd like to call external software installed on the worker node outside of the HDFS. Is this possible? What is the best way to do this?

I understand that this may take some of the advantages/scalability of MapReduce away, but i'd like to interact both within the HDFS and call compiled/installed external software codes within my mapper to process some data.

370

asked Sep 03 '11 04:09

Joris

1 Answers

Mappers (and reducers) are like any other process on the box- as long as the TaskTracker user has permission to run the executable, there is no problem doing so. There are a few ways to call external processes, but since we are already in Java, ProcessBuilder seems a logical place to start.

EDIT: Just found that Hadoop has a class explicitly for this purpose: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/Shell.html

103

answered Sep 28 '22 04:09

Chris Shain

Related questions
                            
                                What is the difference between job.submit and job.waitForComplete in Apache Hadoop?
                            
                                What is significance of the Oozie MR launcher?
                            
                                Apache Nutch: Get outlink URL's text context
                            
                                Hadoop YARN how to determine the number of containers
                            
                                Cassandra + Solr/Hadoop/Spark - Choosing the right tools
                            
                                Apache flume twitter agent not streaming data
                            
                                Hadoop command line -D options not working
                            
                                Namenode HA (UnknownHostException: nameservice1)
                            
                                Hadoop Error - All data nodes are aborting
                            
                                hadoop warn EBADF: Bad file descriptor
                            
                                Pydoop stucks on readline from HDFS files
                            
                                Spark Task not serializable (Case Classes)
                            
                                Why is Dockerized Hadoop datanode registering with the wrong IP address?
                            
                                Hadoop Yarn: How to limit dynamic self allocation of resources with Spark?
                            
                                spark: SAXParseException while writing to parquet on s3
                            
                                How does back pressure property work in Spark Streaming?
                            
                                YARN: Containers and JVM
                            
                                Spark Shell with Yarn - Error: Yarn application has already ended! It might have been killed or unable to launch application master
                            
                                Spring-Batch for a massive nightly / hourly Hive / MySQL data processing
                            
                                Problem starting tasktracker in hadoop under windows

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Running Hadoop MapReduce, is it possible to call external executables outside of HDFS

Tags:

hadoop

mapreduce

hdfs

Joris

People also ask

1 Answers

Chris Shain

Recent Activity

Donate For Us