Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elastic Map Reduce External Jars

So, it is easy enough to handle external jars when using hadoop straight up. You have -libjars option that will do this for you. The question is how do you do this with EMR. There must be an easy way of doing it. I thought -cachefile option of the CLI would do it, but I couldn't get it working somehow. Any ideas anyone?

Thanks for the help.

like image 905
delmet Avatar asked Jun 14 '11 00:06

delmet


2 Answers

The best luck I have had with external jar dependencies is to copy them (via bootstrap action) to /home/hadoop/lib throughout the cluster. That path is on the classpath of every host. This technique is the only one that seems to work regardless of where the code lives that accesses external jars (tool, job, or task).

like image 82
Judge Mental Avatar answered Nov 15 '22 12:11

Judge Mental


One option is to have the first step in your jobflow set up the JARs wherever they need to be. Or, if they are dependencies, you can package them in with your application JAR (which is probably in S3).

like image 3
ajduff574 Avatar answered Nov 15 '22 13:11

ajduff574