I have done following configuration for Apache Spark 1.2.1 Standalone Cluster:
I run Spark in Standalone cluster manager as:
./spark-submit --class com.b2b.processor.ProcessSampleJSONFileUpdate \
--conf num-executors=2 \
--executor-memory 2g \
--driver-memory 3g \
--deploy-mode cluster \
--supervise \
--master spark://abc.xyz.net:7077 \
hdfs://abc:9000/b2b/b2bloader-1.0.jar ds6_2000/*.json
My job is getting executed successfully, i.e. reads data from files and inserts it to Cassandra.
Spark documentation says that in Standalone cluster make use of all available cores but my cluster is using only 1 core per application. Also,after starting application on Spark UI it is showing Applications:0 running and Drivers:1 running.
My query is:
The code:
public static void main(String[] args) throws Exception {
String fileName = args[0];
System.out.println("----->Filename : "+fileName);
Long now = new Date().getTime();
SparkConf conf = new SparkConf(true)
.setMaster("local")
.setAppName("JavaSparkSQL_" +now)
.set("spark.executor.memory", "1g")
.set("spark.cassandra.connection.host", "192.168.1.65")
.set("spark.cassandra.connection.native.port", "9042")
.set("spark.cassandra.connection.rpc.port", "9160");
JavaSparkContext ctx = new JavaSparkContext(conf);
JavaRDD<String> input = ctx.textFile("hdfs://abc.xyz.net:9000/dataLoad/resources/" + fileName,6);
JavaRDD<DataInput> result = input.mapPartitions(new ParseJson()).filter(new FilterLogic());
System.out.print("Count --> "+result.count());
System.out.println(StringUtils.join(result.collect(), ","));
javaFunctions(result).writerBuilder("ks","pt_DataInput",mapToRow(DataInput.class)).saveToCassandra();
}
If you're setting your master in your app to local (via .setMaster("local")
), it will not connect to the spark://abc.xyz.net:7077
.
You don't need to set the master in app if you are setting it up with the spark-submit
command.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With