Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Spark report spark.SparkException: File ./someJar.jar exists and does not match contents of

Tags:

apache-spark

I sometimes see I see the following error message when running Spark jobs:

13/10/21 21:27:35 INFO cluster.ClusterTaskSetManager: Loss was due to spark.SparkException: File ./someJar.jar exists and does not match contents of ...

What does this mean? How do I diagnose and fix this?

like image 563
samthebest Avatar asked Sep 07 '14 06:09

samthebest


2 Answers

After digging around in the logs I found "no space left on device" exceptions too, then when I ran df -h and df -i on every node I found a partition that was full. Interestingly this partition does not appear to be used for data, but storing jars temporarily. It's name was something like /var/run or /run.

The solution was to clean the partition of old files and to setup some automated cleaning, I think setting spark.cleaner.ttl to say a day (86400) should prevent it happening again.

like image 62
samthebest Avatar answered Sep 22 '22 20:09

samthebest


Running on AWS EC2 I periodically encounter disk space issues - even after setting the spark.cleaner.ttl to a few hours (we iterate quickly). I decided to solve them by moving the /root/spark/work directory to the mounted ephemeral disk on the instance (I'm using r3.larges which have a 32GB ephemeral at /mnt):

readonly HOST=some-ec2-hostname-here

ssh -t root@$HOST spark/sbin/stop-all.sh
ssh -t root@$HOST "for SLAVE in \$(cat /root/spark/conf/slaves) ; do ssh \$SLAVE 'rm -rf /root/spark/work && mkdir /mnt/work && ln -s /mnt/work /root/spark/work' ; done"
ssh -t root@$HOST spark/sbin/start-all.sh

As far as I can tell as of Spark 1.5 the work directory still does not make use of the mounted storage by default. I haven't tinkered with the deployment settings enough to see if this is even configurable.

like image 45
cfeduke Avatar answered Sep 21 '22 20:09

cfeduke