SparkJob in multinode cluster: WARN TaskSetManager: Lost task 0.0 in stage 0.0: java.io.FileNotFoundException

Question

I have just set up a Spark multi-node cluster. My cluster is made of an iMac and a couple of Raspberry all linked via Ethernet with ssh passwordless access to one another.

The Spark command I'm trying to execute is:

spark-submit --master spark://10.0.0.20:7077 rdd/WordCount.py

My slave nodes are: 10.0.0.10 10.0.0.11

The cod exits with the error shown on the following snippet of the log:

21/01/13 13:54:38 INFO Utils: Fetching ftp://myuser:mypassword@my-NAS-IP:21/Projects/Corso-Spark/word_count.text to /private/var/folders/0s/gkptv9tn6h100zv3m17ctsd400yjj9/T/spark-5c31c0e5-6385-4945-928a-3883332189ac/userFiles-abf87986-8096-4bf4-a9e5-44fc6a3d5676/fetchFileTemp8028573497969255747.tmp
...    
21/01/13 13:54:54 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 10.0.0.10, executor 1): java.io.FileNotFoundException: File file:/private/var/folders/0s/gkptv9tn6h100zv3m17ctsd400yjj9/T/spark-5c31c0e5-6385-4945-928a-3883332189ac/userFiles-abf87986-8096-4bf4-a9e5-44fc6a3d5676/word_count.text does not exist
            at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
            at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
            at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
            at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:428)
            at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:142)
            at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:346)
            at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
            at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109)
            at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
            at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:282)
            at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:281)
            at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:239)
            at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:96)
            at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
            at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
            at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
            at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
            at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
            at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
            at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
            at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
            at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
            at org.apache.spark.scheduler.Task.run(Task.scala:127)
            at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
            at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
            at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
            at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
            at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
            at java.base/java.lang.Thread.run(Thread.java:834)

The file word_count.text inside the python is retrieved via FTP @ this URL:

"ftp://myuser:mypassword@my-NAS-IP:21/Projects/Corso-Spark/word_count.text"

Apparently, the file is fetched in the master inside the /private/var/folders/0s/gkptv9tn6h100zv3m17ctsd400yjj9/T/spark-5c31c0e5-6385-4945-928a-3883332189ac/userFiles-abf87986-8096-4bf4-a9e5-44fc6a3d5676 directory and then Spark tries to retrieve the same file from the same directory on the slaves. Of course, in the slaves Spark cannot find the path. Why?

As a further test, I've created the path /private/var/folders/0s/gkptv9tn6h100zv3m17ctsd400yjj9/T/spark-5c31c0e5-6385-4945-928a-3883332189ac/userFiles-abf87986-8096-4bf4-a9e5-44fc6a3d5676 on the slave and manually put the file, but I wasn't able to get rid of the error.

Somebody can help?

Thank you in advance.

SteveRoss · Accepted Answer

[SOLVED]: what all the tutorials I've found online don't say is that you have to mount the exact same path where the input file will be fetched on the master in each and every worker.

SparkJob in multinode cluster: WARN TaskSetManager: Lost task 0.0 in stage 0.0: java.io.FileNotFoundException

Tags:

java

io

apache-spark

pyspark

filenotfoundexception

SteveRoss

1 Answers

SteveRoss

Recent Activity

Donate For Us

SparkJob in multinode cluster: WARN TaskSetManager: Lost task 0.0 in stage 0.0: java.io.FileNotFoundException

Tags:

java

io

apache-spark

pyspark

filenotfoundexception

SteveRoss

1 Answers

SteveRoss

Related questions

Recent Activity

Donate For Us