Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Spark Standalone cluster not use all available cores?

I have done following configuration for Apache Spark 1.2.1 Standalone Cluster:

  • Hadoop 2.6.0
  • 2 nodes - one master and one slave - in Standalone cluster
  • 3-node Cassandra
  • total cores: 6 (2 master, 4 slaves)
  • total memory: 13 GB

I run Spark in Standalone cluster manager as:

./spark-submit --class com.b2b.processor.ProcessSampleJSONFileUpdate \
               --conf num-executors=2 \
               --executor-memory 2g \
               --driver-memory 3g \
               --deploy-mode cluster \
               --supervise \
               --master spark://abc.xyz.net:7077 \ 
               hdfs://abc:9000/b2b/b2bloader-1.0.jar ds6_2000/*.json 

My job is getting executed successfully, i.e. reads data from files and inserts it to Cassandra.

Spark documentation says that in Standalone cluster make use of all available cores but my cluster is using only 1 core per application. Also,after starting application on Spark UI it is showing Applications:0 running and Drivers:1 running.

My query is:

  1. Why it is not using all available 6 cores?
  2. Why spark UI showing Applications:0 Running?

The code:

public static void main(String[] args) throws Exception {

  String fileName = args[0];
  System.out.println("----->Filename : "+fileName);        

  Long now = new Date().getTime();

  SparkConf conf = new SparkConf(true)
           .setMaster("local")
           .setAppName("JavaSparkSQL_" +now)
           .set("spark.executor.memory", "1g")
           .set("spark.cassandra.connection.host", "192.168.1.65")
           .set("spark.cassandra.connection.native.port", "9042")
           .set("spark.cassandra.connection.rpc.port", "9160");

  JavaSparkContext ctx = new JavaSparkContext(conf);

  JavaRDD<String> input =  ctx.textFile("hdfs://abc.xyz.net:9000/dataLoad/resources/" + fileName,6);
  JavaRDD<DataInput> result = input.mapPartitions(new ParseJson()).filter(new FilterLogic());

  System.out.print("Count --> "+result.count());
  System.out.println(StringUtils.join(result.collect(), ","));

  javaFunctions(result).writerBuilder("ks","pt_DataInput",mapToRow(DataInput.class)).saveToCassandra();

}
like image 756
Abhinandan Satpute Avatar asked Jan 09 '23 12:01

Abhinandan Satpute


1 Answers

If you're setting your master in your app to local (via .setMaster("local")), it will not connect to the spark://abc.xyz.net:7077.

You don't need to set the master in app if you are setting it up with the spark-submit command.

like image 111
eliasah Avatar answered Jan 30 '23 10:01

eliasah