We are running certain spark jobs and we see .sparkstaging directoring in hdfs persisting after the job completion. Is there any parameter we need to set to delete the staging directory after job completion?
spark.yarn.preserve.staging.files is false by default and hence we have not set it explicitly. we are running spark on yarn using hortonworks and spark version 1.2
Regards, Manju
Please check for the following log events in the job completion console output to get more insights into what's going on:
ApplicationMaster: Deleting staging directory .sparkStaging/application_xxxxxx_xxxx
- this means that the application was able to successfully clean up the staging directoryApplicationMaster: Staging directory is null
- this means that application was not to able to find the staging dir for this applicationApplicationMaster: Failed to cleanup staging dir .sparkStaging/application_xxxxxx_xxxx
- this means something went wrong deleting the staging directoryCould you also double check these properties in the cluster which can affect the scenario you have mentioned: spark.yarn.preserve.staging.files
and this SPARK_YARN_STAGING_DIR
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With