How to check a file/folder is present using pyspark without getting exception

Tags:

I am trying to keep a check for the file whether it is present or not before reading it from my pyspark in databricks to avoid exceptions? I tried below code snippets but i am getting exception when file is not present

from pyspark.sql import *
from pyspark.conf import SparkConf
SparkSession.builder.config(conf=SparkConf())
try:
    df = sqlContext.read.format('com.databricks.spark.csv').option("delimiter",",").options(header='true', inferschema='true').load('/FileStore/tables/HealthCareSample_dumm.csv')
    print("File Exists")
except IOError:
    print("file not found")`

When i have file, it reads file and "prints File Exists" but when the file is not there it will throw "AnalysisException: 'Path does not exist: dbfs:/FileStore/tables/HealthCareSample_dumm.csv;'"

581

asked Apr 09 '19 09:04

Amareshwar Reddy

2 Answers

Thanks @Dror and @Kini. I run spark on cluster, and I must add sc._jvm.java.net.URI.create("s3://" + path.split("/")[2]), here s3 is the prefix of the file system of your cluster.

  def path_exists(path):
    # spark is a SparkSession
    sc = spark.sparkContext
    fs = sc._jvm.org.apache.hadoop.fs.FileSystem.get(
        sc._jvm.java.net.URI.create("s3://" + path.split("/")[2]),
        sc._jsc.hadoopConfiguration(),
    )
    return fs.exists(sc._jvm.org.apache.hadoop.fs.Path(path))

176

answered Jan 04 '23 03:01

rosefun

fs = sc._jvm.org.apache.hadoop.fs.FileSystem.get(sc._jsc.hadoopConfiguration())
fs.exists(sc._jvm.org.apache.hadoop.fs.Path("path/to/SUCCESS.txt"))

answered Jan 04 '23 03:01

Prathik Kini

Related questions
                            
                                weekofyear() returning seemingly incorrect results for January 1
                            
                                PySpark - to_date format from column
                            
                                Replace string in PySpark
                            
                                Pyspark 2.4.0, read avro from kafka with read stream - Python
                            
                                PySpark: How to Append Dataframes in For Loop
                            
                                How to count the trailing zeroes in an array column in a PySpark dataframe without a UDF
                            
                                How to print rdd in python in spark
                            
                                Stack Overflow while processing several columns with a UDF
                            
                                first_value windowing function in pyspark
                            
                                In Apache Spark 2.0.0, is it possible to fetch a query from an external database (rather than grab the whole table)?
                            
                                check if a row value is null in spark dataframe
                            
                                Querying json object in dataframe using Pyspark
                            
                                Filter PySpark DataFrame by checking if string appears in column
                            
                                Pyspark 'NoneType' object has no attribute '_jvm' error
                            
                                Pandas scalar UDF failing, IllegalArgumentException
                            
                                Spark ALS predictAll returns empty
                            
                                withColumn not allowing me to use max() function to generate a new column
                            
                                How to append to a csv file using df.write.csv in pyspark?
                            
                                IF Statement Pyspark
                            
                                Difference in usecases for AWS Sagemaker vs Databricks?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to check a file/folder is present using pyspark without getting exception

Tags:

pyspark

azure-databricks

Amareshwar Reddy

People also ask

2 Answers

rosefun

Prathik Kini

Recent Activity

Donate For Us