I am running a Spark job. I have 4 cores and worker memory set to 5G. Application master is on another machine in the same network, and does not host any workers. This is my code: <pre class="prettyprint"><code>private void myClass() { // configuration of the spark context SparkConf conf = new SparkConf().setAppName("myWork").setMaster("spark://myHostIp:7077").set("spark.driver.allowMultipleContexts", "true"); // creation of the spark context in wich we will run the algorithm JavaSparkContext sc = new JavaSparkContext(conf); // algorithm for(int i = 0; i<200; i++) { System.out.println("==============================================================="); System.out.println("iteration : " + i); System.out.println("==============================================================="); ArrayList<Boolean> list = new ArrayList<Boolean>(); for(int j = 0; j < 1900; j++){ list.add(true); } JavaRDD<Ant> ratings = sc.parallelize(list, 100) .map(bool -> new myObj()) .map(obj -> this.setupObj(obj)) .map(obj -> this.moveObj(obj)) .cache(); int[] stuff = ratings .map(obj -> obj.getStuff()) .reduce((obj1,obj2)->this.mergeStuff(obj1,obj2)); this.setStuff(tour); ArrayList<TabObj> tabObj = ratings .map(obj -> this.objToTabObjAsTab(obj)) .reduce((obj1,obj2)->this.mergeTabObj(obj1,obj2)); ratings.unpersist(false); this.setTabObj(tabObj); } sc.close(); } </code></pre> When I start it, I can see progress on the Spark UI, but it is really slow (I have to set the parrallelize quite high, otherwise I have a timeout issue). I thought it was a CPU bottleneck, but the JVM CPU consumption is actually very low (most of the time it is 0%, sometime a bit more than 5%...). The JVM is using around 3G Of memory according to the monitor, with only 19M cached. The master host has 4 cores, and less memory (4G). That machine shows 100% CPU consumption (a full core) and I don't understand why it is that high... It just has to send partitions to the worker on the other machine, right? Why is CPU consumption low on the worker, and high on the master?

<ol> <li>Make sure you have submit your Spark job by Yarn or mesos in the cluster, otherwise it may only running in your master node.</li> <li>As your code are pretty simple it should be very fast to finish the computation, but i suggest to use wordcount example try to read few GB of input sources to test how the CPU consuming looks like. </li> <li> Please use "local[*]" . * means use your All cores for computatation SparkConf sparkConf = new SparkConf().set("spark.driver.host", "localhost").setAppName("unit-testing").setMaster("local[*]"); References: https://spark.apache.org/docs/latest/configuration.html </li> <li>In spark there have a lot of things could influence the CPU and memory usage, such as executors and each spark.executor.memory you like to distribute. </li> </ol>

Low cpu usage while running a spark job

Tags:

java

apache-spark

cpu-usage

I am running a Spark job. I have 4 cores and worker memory set to 5G. Application master is on another machine in the same network, and does not host any workers. This is my code:

private void myClass() {
    // configuration of the spark context
    SparkConf conf = new SparkConf().setAppName("myWork").setMaster("spark://myHostIp:7077").set("spark.driver.allowMultipleContexts", "true");
    // creation of the spark context in wich we will run the algorithm
    JavaSparkContext sc = new JavaSparkContext(conf);

    // algorithm
    for(int i = 0; i<200; i++) {
        System.out.println("===============================================================");
        System.out.println("iteration : " + i);
        System.out.println("===============================================================");
        ArrayList<Boolean> list = new ArrayList<Boolean>();
        for(int j = 0; j < 1900; j++){
            list.add(true);
        }
        JavaRDD<Ant> ratings = sc.parallelize(list, 100)
                    .map(bool -> new myObj())
                    .map(obj -> this.setupObj(obj))
                    .map(obj -> this.moveObj(obj))
                    .cache();
        int[] stuff = ratings
                    .map(obj -> obj.getStuff())
                    .reduce((obj1,obj2)->this.mergeStuff(obj1,obj2));
        this.setStuff(tour);

        ArrayList<TabObj> tabObj = ratings
                    .map(obj -> this.objToTabObjAsTab(obj))
                    .reduce((obj1,obj2)->this.mergeTabObj(obj1,obj2));
        ratings.unpersist(false);

        this.setTabObj(tabObj);
    }

    sc.close();
}

When I start it, I can see progress on the Spark UI, but it is really slow (I have to set the parrallelize quite high, otherwise I have a timeout issue). I thought it was a CPU bottleneck, but the JVM CPU consumption is actually very low (most of the time it is 0%, sometime a bit more than 5%...).

The JVM is using around 3G Of memory according to the monitor, with only 19M cached.

The master host has 4 cores, and less memory (4G). That machine shows 100% CPU consumption (a full core) and I don't understand why it is that high... It just has to send partitions to the worker on the other machine, right?

Why is CPU consumption low on the worker, and high on the master?

354

asked Jul 07 '17 13:07

DeepProblems

1 Answers

Make sure you have submit your Spark job by Yarn or mesos in the cluster, otherwise it may only running in your master node.
As your code are pretty simple it should be very fast to finish the computation, but i suggest to use wordcount example try to read few GB of input sources to test how the CPU consuming looks like.
Please use "local[*]" . * means use your All cores for computatation

SparkConf sparkConf = new SparkConf().set("spark.driver.host", "localhost").setAppName("unit-testing").setMaster("local[*]"); References: https://spark.apache.org/docs/latest/configuration.html
In spark there have a lot of things could influence the CPU and memory usage, such as executors and each spark.executor.memory you like to distribute.

answered Sep 19 '22 16:09

SharpLu

Related questions
                            
                                Is there a way to set the target for a task dynamically with the App Engine Java runtime?
                            
                                How do I declare a default parser for Rest Assured 3.0.3 (using Java and TestNG)?
                            
                                Spring Boot integration tests failing when run after one another
                            
                                Non-intuitive object eviction from garbage collection
                            
                                How to access a value in the application.properties file in Spring Boot app with main method
                            
                                How can I make maven-metadata.xml have the same timestamp as the artifact wen deployed with maven?
                            
                                Repository design pattern - should there be one repo for every Dao?
                            
                                How can I change the label on a Unit using the Java Measurement API?
                            
                                Why isn't my ORDS servlet executing, even though I've followed the existing guidelines?
                            
                                How to convert String into locale in java
                            
                                Testing with Apache Camel AdviceWith and weaveById
                            
                                Unexpected response 408 log in cometd client side
                            
                                Room Kotlin:Entities and Pojos must have a usable public constructor
                            
                                Find a matrix which satisfies certain constraints
                            
                                why kotlin lambda decompiled to java code is (Function0)null.INSTANCE
                            
                                cancelling a future task in java [duplicate]
                            
                                Hibernate cascade remove ConstraintViolationException
                            
                                smooth rounded corners in swing
                            
                                Unexpected result in long/int division
                            
                                How can I mock JodaTime actual date?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With