Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Monitoring the Memory Usage of Spark Jobs

How can we get the overall memory used for a spark job. I am not able to get the exact parameter which we can refer to retrieve the same. Have referred to Spark UI but not sure of the field which we can refer. Also in Ganglia we have the following options: a)Memory Buffer b)Cache Memory c)Free Memory d)Shared Memory e)Free Swap Space

Not able to get any option related to Memory Used. Does anyone have some idea regarding this.

like image 359
Sumit Khurana Avatar asked Sep 21 '16 11:09

Sumit Khurana


1 Answers

If you persist your RDDs you can see how big they are in memory via the UI.

It's hard to get an idea of how much memory is being used for intermediate tasks (e.g. for shuffles). Basically Spark will use as much memory as it needs given what's available. This means that if your RDDs take up more than 50% of your available resources, your application might slow down because there are fewer resources available for execution.

like image 67
Graham S Avatar answered Oct 10 '22 12:10

Graham S