I got a problem when I execute my code in spark-shell.
[Stage 1:> (0 + 0) / 16]
17/01/13 06:09:24 WARN TaskSetManager: Stage 1 contains a task of very large size (1057 KB). The maximum recommended task size is 100 KB.
[Stage 1:> (0 + 4) / 16]
After this warning the execution blocked.
Who can I solve it?
I tried this but it's doesn't solve the problem.
val conf = new SparkConf()
.setAppName("MyApp")
.setMaster("local[*]")
.set("spark.driver.maxResultSize", "3g")
.set("spark.executor.memory" ,"3g");
val sc = new SparkContext(conf);`
I had similar error:
scheduler.TaskSetManager: Stage 2 contains a task of very large size
(34564 KB). The maximum recommended task size is 100 KB
My input data was of size ~150MB with 4 partitions (i.e., each partition was of size ~30MB). That explains 34564 KB
size mentioned in above error message.
Reason: Task is the smallest unit of work in spark that acts on partitions of your input data. Hence, if spark tells that task's size is more than recommended size, it means that the partition its handling has way too much data.
Solution that worked for me:
reducing task size => reduce the data its handling => increase
numPartitions to break down data into smaller chunks
df.rdd.getNumPartitions
df.repartition(100)
It's most likely because of large size requirements by the variables in any of your tasks. The accepted answer to this question should help you.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With