Can any one suggest the best way to check file existence in pyspark.
currently am using below method to check , please advise.
def path_exist(path):
try:
rdd=sparkSqlCtx.read.format("orc").load(path)
rdd.take(1)
return True
except Exception as e:
return False
You can use Java API org.apache.hadoop.fs.{FileSystem, Path}
by Py4j.
jvm = spark_session._jvm
jsc = spark_session._jsc
fs = jvm.org.apache.hadoop.fs.FileSystem.get(jsc.hadoopConfiguration())
if fs.exists(jvm.org.apache.hadoop.fs.Path("/foo/bar")):
print("/foo/bar exists")
else:
print("/foo/bar does not exist")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With