Error when reading a file in Spark

Question

I'm having a hard time figuring out why Spark is not accessing a file that I add to the context. Below is my code in the repl:

scala> sc.addFile("/home/ubuntu/my_demo/src/main/resources/feature_matrix.json")

scala> val featureFile = sc.textFile(SparkFiles.get("feature_matrix.json"))

featureFile: org.apache.spark.rdd.RDD[String] = /tmp/spark/ubuntu/spark-d7a13d92-2923-4a04-a9a5-ad93b3650167/feature_matrix.json MappedRDD[1] at textFile at <console>:60

scala> featureFile.first()
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: cfs://172.30.26.95/tmp/spark/ubuntu/spark-d7a13d92-2923-4a04-a9a5-ad93b3650167/feature_matrix.json

The file does in fact exist at /tmp/spark/ubuntu/spark-d7a13d92-2923-4a04-a9a5-ad93b3650167/feature_matrix.json

Any help appreciated.

Justin Pihony · Accepted Answer

If you are using addFile, then you need to use get to retrieve it. Also, the addFile method is lazy, so it is very possible that it was not put in the location you are finding it until you actually call first, so you are creating this kind of circle.

All that being said, I don't know that using SparkFiles as the first action is ever going to be a smart idea. Use something like --files with SparkSubmit and the files will be put in your working directory.

Error when reading a file in Spark

Tags:

scala

cassandra

apache-spark

datastax-enterprise

worker1138

1 Answers

Justin Pihony

Recent Activity

Donate For Us

Error when reading a file in Spark

Tags:

scala

cassandra

apache-spark

datastax-enterprise

worker1138

1 Answers

Justin Pihony

Related questions

Recent Activity

Donate For Us