Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is garbage collection time part of execution time of a task in apache spark?

Tags:

apache-spark

I am a beginner in apache spark and came across the garbage collection time of tasks in apache spark webUI. Does the execution time of a task include the task garbage collection time?

like image 658
Rajasekhar Mekala Avatar asked Jan 31 '23 02:01

Rajasekhar Mekala


1 Answers

The answer is yes, the execution that shows in Spark UI of garbage collector is part of total execution time. If your GC is taking more time than the real execution, better you check what you are doing.

If you are facing any problem with the GC, there is a tons of solutions that you can improve the memory usage of Spark, or the GC administration.

According to Databricks blog, the GC execution time is a recursive problem in any big company that use GBs of memory to execute your tasks:

For example, garbage collection takes a long time, causing program to experience long delays, or even crash in severe cases.

You can see the full text here.

Other things that you can see is how to improve or tuning your spark application to avoid the GC time of execution, or GC Overhead Limit or even the OOM errors during execution.

Please check this part of documentation.

like image 173
Thiago Baldim Avatar answered May 24 '23 12:05

Thiago Baldim