Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cleaning up Spark history logs

Tags:

apache-spark

We have long running EMR cluster where we submit Spark jobs. I see that over time the HDFS fills up with the Spark application logs which sometimes renders a host unhealthy as viewed by EMR/Yarn (?).

Running hadoop fs -R -h / shows [1] which clearly shows no application logs have ever been deleted.

We have set the spark.history.fs.cleaner.enabled to true (validated this in the Spark UI) and were hoping the other defaults like cleaner interval (1 day) and cleaner max age (7d) as mentioned at: http://spark.apache.org/docs/latest/monitoring.html#spark-configuration-options would take care of cleaning up these logs.​ But that is not the case.

Any ideas?

[1]

-rwxrwx---   2 hadoop spark      543.1 M 2017-01-11 13:13 /var/log/spark/apps/application_1484079613665_0001
-rwxrwx---   2 hadoop spark        7.8 G 2017-01-17 10:51 /var/log/spark/apps/application_1484079613665_0002.inprogress
-rwxrwx---   2 hadoop spark        1.4 G 2017-01-18 08:11 /var/log/spark/apps/application_1484079613665_0003
-rwxrwx---   2 hadoop spark        2.9 G 2017-01-20 07:41 /var/log/spark/apps/application_1484079613665_0004
-rwxrwx---   2 hadoop spark      125.9 M 2017-01-20 09:57 /var/log/spark/apps/application_1484079613665_0005
-rwxrwx---   2 hadoop spark        4.4 G 2017-01-23 10:19 /var/log/spark/apps/application_1484079613665_0006
-rwxrwx---   2 hadoop spark        6.6 M 2017-01-23 10:31 /var/log/spark/apps/application_1484079613665_0007
-rwxrwx---   2 hadoop spark       26.4 M 2017-01-23 11:09 /var/log/spark/apps/application_1484079613665_0008
-rwxrwx---   2 hadoop spark       37.4 M 2017-01-23 11:53 /var/log/spark/apps/application_1484079613665_0009
-rwxrwx---   2 hadoop spark      111.9 M 2017-01-23 13:57 /var/log/spark/apps/application_1484079613665_0010
-rwxrwx---   2 hadoop spark        1.3 G 2017-01-24 10:26 /var/log/spark/apps/application_1484079613665_0011
-rwxrwx---   2 hadoop spark        7.0 M 2017-01-24 10:37 /var/log/spark/apps/application_1484079613665_0012
-rwxrwx---   2 hadoop spark       50.7 M 2017-01-24 11:40 /var/log/spark/apps/application_1484079613665_0013
-rwxrwx---   2 hadoop spark       96.2 M 2017-01-24 13:27 /var/log/spark/apps/application_1484079613665_0014
-rwxrwx---   2 hadoop spark      293.7 M 2017-01-24 17:58 /var/log/spark/apps/application_1484079613665_0015
-rwxrwx---   2 hadoop spark        7.6 G 2017-01-30 07:01 /var/log/spark/apps/application_1484079613665_0016
-rwxrwx---   2 hadoop spark        1.3 G 2017-01-31 02:59 /var/log/spark/apps/application_1484079613665_0017
-rwxrwx---   2 hadoop spark        2.1 G 2017-02-01 12:04 /var/log/spark/apps/application_1484079613665_0018
-rwxrwx---   2 hadoop spark        2.8 G 2017-02-03 08:32 /var/log/spark/apps/application_1484079613665_0019
-rwxrwx---   2 hadoop spark        5.4 G 2017-02-07 02:03 /var/log/spark/apps/application_1484079613665_0020
-rwxrwx---   2 hadoop spark        9.3 G 2017-02-13 03:58 /var/log/spark/apps/application_1484079613665_0021
-rwxrwx---   2 hadoop spark        2.0 G 2017-02-14 11:13 /var/log/spark/apps/application_1484079613665_0022
-rwxrwx---   2 hadoop spark        1.1 G 2017-02-15 03:49 /var/log/spark/apps/application_1484079613665_0023
-rwxrwx---   2 hadoop spark        8.8 G 2017-02-21 05:42 /var/log/spark/apps/application_1484079613665_0024
-rwxrwx---   2 hadoop spark      371.2 M 2017-02-21 11:54 /var/log/spark/apps/application_1484079613665_0025
-rwxrwx---   2 hadoop spark        1.4 G 2017-02-22 09:17 /var/log/spark/apps/application_1484079613665_0026
-rwxrwx---   2 hadoop spark        3.2 G 2017-02-24 12:36 /var/log/spark/apps/application_1484079613665_0027
-rwxrwx---   2 hadoop spark        9.5 M 2017-02-24 12:48 /var/log/spark/apps/application_1484079613665_0028
-rwxrwx---   2 hadoop spark       20.5 G 2017-03-10 04:00 /var/log/spark/apps/application_1484079613665_0029
-rwxrwx---   2 hadoop spark        7.3 G 2017-03-10 04:04 /var/log/spark/apps/application_1484079613665_0030.inprogress
like image 852
Swaranga Sarma Avatar asked Mar 15 '17 18:03

Swaranga Sarma


People also ask

How do I delete Spark history?

So it is Spark (should be in Spark Support forum). Open chat window with that user, right click in the messages area and select Clear. Then press Yes to confirm that you want permanently delete conversation history with that contact.

Where are Spark executor logs stored?

Standalone mode: Spark executor logs are located in the $SPARK_HOME/work/app-<AppName> directory (where <AppName> is the name of your application). The location also contains stdout/stderr from H2O. YARN mode: The executors logs are available via the yarn logs -applicationId <appId> command.


1 Answers

I was running into this issue on emr-5.4.0, and set spark.history.fs.cleaner.interval to 1h, and was able to get the cleaner to run.

For reference, here is the end of my spark-defaults.conf file:

spark.history.fs.cleaner.enabled true
spark.history.fs.cleaner.maxAge  12h
spark.history.fs.cleaner.interval 1h

After you make the change, restart your spark history server.

Another clarification: Setting these values during application run, i.e spark-submit via --conf has no effect. Either set them at cluster creation time via the EMR configuration API or manually edit the spark-defaults.conf, set these values and restart the spark history server. Also note that the logs will be cleaned up the next time your Spark app restarts. For instance, if you have a long running Spark streaming job, it will not delete any logs for that application run and will keep accumulating logs. And when the next time the job restarts (may be because of a deployment) it will cleanup the older logs.

like image 111
Ferris Tseng Avatar answered Sep 22 '22 09:09

Ferris Tseng