Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

apache spark - check if file exists

I am new to spark and I have a question. I have a two step process in which the first step write a SUCCESS.txt file to a location on HDFS. My second step which is a spark job has to verify if that SUCCESS.txt file exists before it starts processing the data.

I checked the spark API and didnt find any method which checks if a file exists. Any ideas how to handle this?

The only method I found was sc.textFile(hdfs:///SUCCESS.txt).count() which would throw an exception when the file does not exist. I have to catch that exception and write my program accordingly. I didnt really like this approach. Hoping to find a better alternative.

like image 831
Chandra Avatar asked May 22 '15 20:05

Chandra


1 Answers

For a file in HDFS, you can use the hadoop way of doing this:

val conf = sc.hadoopConfiguration val fs = org.apache.hadoop.fs.FileSystem.get(conf) val exists = fs.exists(new org.apache.hadoop.fs.Path("/path/on/hdfs/to/SUCCESS.txt")) 
like image 68
DPM Avatar answered Sep 19 '22 07:09

DPM