Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

increase task size spark [duplicate]

I got a problem when I execute my code in spark-shell.

[Stage 1:>             (0 + 0) / 16]
17/01/13 06:09:24 WARN TaskSetManager: Stage 1 contains a task of very large size (1057 KB). The maximum recommended task size is 100 KB.
[Stage 1:>             (0 + 4) / 16]

After this warning the execution blocked.

Who can I solve it?

I tried this but it's doesn't solve the problem.

val conf = new SparkConf()
    .setAppName("MyApp")
    .setMaster("local[*]")
    .set("spark.driver.maxResultSize", "3g")
    .set("spark.executor.memory" ,"3g");
val sc = new SparkContext(conf);`
like image 471
user7375007 Avatar asked Dec 18 '22 10:12

user7375007


2 Answers

I had similar error:

scheduler.TaskSetManager: Stage 2 contains a task of very large size
(34564 KB). The maximum recommended task size is 100 KB

My input data was of size ~150MB with 4 partitions (i.e., each partition was of size ~30MB). That explains 34564 KB size mentioned in above error message.

Reason: Task is the smallest unit of work in spark that acts on partitions of your input data. Hence, if spark tells that task's size is more than recommended size, it means that the partition its handling has way too much data.

Solution that worked for me:

reducing task size => reduce the data its handling => increase
numPartitions to break down data into smaller chunks
  • So, I tried increasing number of partitions and got rid of the error.
  • One can check number of partitions in dataframe via df.rdd.getNumPartitions
  • To increase partitions: df.repartition(100)
like image 114
Sruthi Poddutur Avatar answered Dec 28 '22 01:12

Sruthi Poddutur


It's most likely because of large size requirements by the variables in any of your tasks. The accepted answer to this question should help you.

like image 21
code Avatar answered Dec 28 '22 01:12

code