Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Hadoop's RunJar method distribute class/jar files across nodes?

I'm trying to use JIT compilation in clojure to generate mapper and reducer classes on the fly. However, these classes aren't being recognized by the JobClient (it's the usual ClassNotFoundException.)

If I AOT compile the Mapper,Reducer and Tool, and run the job using RunJar, everything seems fine. After looking through the source, it seems that it's unpackaging the jar and creating a custom URLClassLoader that it uses to load the "main" implementation. What I'm not seeing is how the jar is distributed across nodes or even how it's getting used in a one-node cluster.

Any help would be much appreciated!

like image 269
Jieren Avatar asked Aug 09 '10 22:08

Jieren


2 Answers

Firstly when we submit a job's jar it gets copied to staging directory configured in the properties, by the jobtracker. And when a tasktracker is assigned the job(by the scheduler ofc) it copies from the staging directory and executes.

Incase you want to give an external Jar for execution you can do so with Distributed Cache facility of Hadoop.

like image 97
Prashant Sharma Avatar answered Oct 22 '22 04:10

Prashant Sharma


Clojure has something in common with other Java scripting methods such as Beanshell, Groovy, and Ant... in that , when you run the script, if you use the classloading features of the script language, when your script launches it de-couples itself from the default classloader and then your JVM is running on the custom classloader for the scripting engine. I have no idea what is causing your error, but you should keep in mind that if your doing ANYTHING at all in your script that would cause a custom classloader to abandon the JVMs default classloader, then it might explain a few things.

In my experience I couldn't overcome these problems and so , for example, with Beanshell, I had to stop using the classloader options and specify my entire classpath on the command line that starts the JVM. That way I knew the script used the default classloader and that all classes would be found.

Another example, with:

classes/groovy/A.groovy

classes/groovy/B.groovy

 public class A {
    public A() {
       B b = new B()
    }
 }

GroovyClassLoader would not load Groovy class B. This type of thing can also be reproduced trying to load a JDBC driver with classForName from within a custom classloader (not the default classloader).

like image 28
djangofan Avatar answered Oct 22 '22 05:10

djangofan