Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running Hadoop MapReduce, is it possible to call external executables outside of HDFS

Within my mapper I'd like to call external software installed on the worker node outside of the HDFS. Is this possible? What is the best way to do this?

I understand that this may take some of the advantages/scalability of MapReduce away, but i'd like to interact both within the HDFS and call compiled/installed external software codes within my mapper to process some data.

like image 370
Joris Avatar asked Sep 03 '11 04:09

Joris


People also ask

Which MapReduce concept redirect the output to different output files?

RecordWriter in Hadoop MapReduce As we know, Reducer takes Mappers intermediate output as input. Then it runs a reducer function on them to generate output that is again zero or more key-value pairs. So, RecordWriter in MapReduce job execution writes these output key-value pairs from the Reducer phase to output files.

How does MapReduce work in Hadoop?

MapReduce assigns fragments of data across the nodes in a Hadoop cluster. The goal is to split a dataset into chunks and use an algorithm to process those chunks at the same time. The parallel processing on multiple machines greatly increases the speed of handling even petabytes of data.

How do I run a MapReduce task in a Hadoop environment?

MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage. Map stage − The map or mapper's job is to process the input data. Generally the input data is in the form of file or directory and is stored in the Hadoop file system (HDFS).


1 Answers

Mappers (and reducers) are like any other process on the box- as long as the TaskTracker user has permission to run the executable, there is no problem doing so. There are a few ways to call external processes, but since we are already in Java, ProcessBuilder seems a logical place to start.

EDIT: Just found that Hadoop has a class explicitly for this purpose: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/Shell.html

like image 103
Chris Shain Avatar answered Sep 28 '22 04:09

Chris Shain