Apache Spark : TaskResultLost (result lost from block manager) Error On cluster

Question

I have a Spark standalone cluster with 3 slaves on Virtualbox. My code is on Java and it is working fine with my small input datasets which their inputs are around 100MB totally.

I set my virtual machines RAM to be 16GB but when I was runnig my code on big input files (about 2GB) I get this error after hours of processing in my reduce part:

Job aborted due to stage failure: Total size of serialized results of 4 tasks (4.3GB) is bigger than spark.driver.maxResultSize`

I edited the spark-defaults.conf and assigned a higher amount (2GB and 4GB) for spark.driver.maxResultSize. It didn't help and the same error showed up.

No I am trying 8GB of spark.driver.maxResultSize and my spark.driver.memory is also the same as RAM size (16GB). But I get this error:

TaskResultLost (result lost from block manager)

Any comments about this? I also include an image.

I don't know if the problem is causing by the large size of maxResultSize or this is something with collections of RDDs in the code. I also provide the mapper part of the code for a better understanding.

enter image description here

JavaRDD<Boolean[][][]> fragPQ = uData.map(new Function<String, Boolean[][][]>() {
        public Boolean[][][] call(String s) {
            Boolean[][][] PQArr = new Boolean[2][][];
            PQArr[0] = new Boolean[11000][];
            PQArr[1] = new Boolean[11000][];
            for (int i = 0; i < 11000; i++) {
                PQArr[0][i] = new Boolean[11000];
                PQArr[1][i] = new Boolean[11000];
                for (int j = 0; j < 11000; j++) {
                    PQArr[0][i][j] = true;
                    PQArr[1][i][j] = true;

ShirishT · Accepted Answer

In general, this error shows that you are collecting/bringing a large amount of data onto the driver. This should never be done. You need to rethink your application logic.

Also, you don't need to modify spark-defaults.conf to set the property. Instead, you can specify such application-specific properties via --conf option in spark-shell or spark-submit, depending on how you run the job.

payamf1 · Answer

SOLVED:

The problem solved by increasing the master RAM size. I studied my case and found out that based on my design assigning 32GB of RAM would be sufficient. Now by doing than, my program is working fine and is calculating everything correctly.

Apache Spark : TaskResultLost (result lost from block manager) Error On cluster

Tags:

java

apache-spark

hadoop

mapreduce

payamf1

2 Answers

ShirishT

payamf1

Recent Activity

Donate For Us

Apache Spark : TaskResultLost (result lost from block manager) Error On cluster

Tags:

java

apache-spark

hadoop

mapreduce

payamf1

2 Answers

ShirishT

payamf1

Related questions

Recent Activity

Donate For Us