Spark standalone cluster looks it's running without a problem :
http://i.stack.imgur.com/gF1fN.png
I followed this tutorial.
I have built a fat jar for running this JavaApp on the cluster. Before maven package:
find .
./pom.xml
./src
./src/main
./src/main/java
./src/main/java/SimpleApp.java
content of SimpleApp.java is :
import org.apache.spark.api.java.*;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
public class SimpleApp {
public static void main(String[] args) {
SparkConf conf = new SparkConf()
.setMaster("spark://10.35.23.13:7077")
.setAppName("My app")
.set("spark.executor.memory", "1g");
JavaSparkContext sc = new JavaSparkContext (conf);
String logFile = "/home/ubuntu/spark-0.9.1/test_data";
JavaRDD<String> logData = sc.textFile(logFile).cache();
long numAs = logData.filter(new Function<String, Boolean>() {
public Boolean call(String s) { return s.contains("a"); }
}).count();
System.out.println("Lines with a: " + numAs);
}
}
This program only works when master is set as setMaster("local"). Otherwise I get this error
$java -cp path_to_file/simple-project-1.0-allinone.jar SimpleApp
http://i.stack.imgur.com/doRSn.png
There's the anonymous class (that extends Function) in SimpleApp.java file. This class is compiled to SimpleApp$1, which should be broadcast to each worker in the Spark cluster.
The simplest way for it is to add the jar explicitly to the Spark context. Add something like sparkContext.addJar("path_to_file/simple-project-1.0-allinone.jar")
after JavaSparkContext
creating and rebuild your jar file. Then the main Spark program (called the driver program) will automatically deliver your application code to the cluster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With