I have just set up a Spark multi-node cluster. My cluster is made of an iMac and a couple of Raspberry all linked via Ethernet with ssh passwordless access to one another.
The Spark command I'm trying to execute is:
spark-submit --master spark://10.0.0.20:7077 rdd/WordCount.py
My slave nodes are: 10.0.0.10 10.0.0.11
The cod exits with the error shown on the following snippet of the log:
21/01/13 13:54:38 INFO Utils: Fetching ftp://myuser:mypassword@my-NAS-IP:21/Projects/Corso-Spark/word_count.text to /private/var/folders/0s/gkptv9tn6h100zv3m17ctsd400yjj9/T/spark-5c31c0e5-6385-4945-928a-3883332189ac/userFiles-abf87986-8096-4bf4-a9e5-44fc6a3d5676/fetchFileTemp8028573497969255747.tmp
...
21/01/13 13:54:54 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 10.0.0.10, executor 1): java.io.FileNotFoundException: File file:/private/var/folders/0s/gkptv9tn6h100zv3m17ctsd400yjj9/T/spark-5c31c0e5-6385-4945-928a-3883332189ac/userFiles-abf87986-8096-4bf4-a9e5-44fc6a3d5676/word_count.text does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:428)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:142)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:346)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109)
at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:282)
at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:281)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:239)
at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:96)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:65)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
The file word_count.text inside the python is retrieved via FTP @ this URL:
"ftp://myuser:mypassword@my-NAS-IP:21/Projects/Corso-Spark/word_count.text"
Apparently, the file is fetched in the master inside the /private/var/folders/0s/gkptv9tn6h100zv3m17ctsd400yjj9/T/spark-5c31c0e5-6385-4945-928a-3883332189ac/userFiles-abf87986-8096-4bf4-a9e5-44fc6a3d5676 directory and then Spark tries to retrieve the same file from the same directory on the slaves. Of course, in the slaves Spark cannot find the path. Why?
Somebody can help?
Thank you in advance.
[SOLVED]: what all the tutorials I've found online don't say is that you have to mount the exact same path where the input file will be fetched on the master in each and every worker.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With