In Databricks, check whether a path exist or not

Question

I am reading CSV files from datalake store, for that I am having multiple paths but if any one path does not exist it gives exception. I want to avoid this expection.

Hauke Mallow · Accepted Answer

I think if you want to check for multiple pathes, the check will fail if one path does not exist. Perhaps you could try a different approach.

For the given example if you want to subselect subfolders you could try the following instead.

Read sub-directories of a given directory:

# list all subfolders and files in directory demo
dir = dbutils.fs.ls ("/mnt/adls2/demo")

Filter out the relevant sub-directories:

pathes = ''

for i in range (0, len(dir)):
  subpath = dir[i].path
  if '/corr' in subpath or '/deci' in subpath and subpath.startswith ('dbfs:/'): # select dirs to read 
    pathes =  pathes + (dir[i].path) + ' '  

# convert the string to a list 
pathes = list(pathes.split())

Use the result-list to read the dataframe:

df = (spark.read
  .json(pathes))

In Databricks, check whether a path exist or not

Tags:

exception

path

csv

load

databricks

Bilal Shafqat

Video Answer

1 Answers

Hauke Mallow

Recent Activity

Donate For Us

In Databricks, check whether a path exist or not

Tags:

exception

path

csv

load

databricks

Bilal Shafqat

Video Answer

1 Answers

Hauke Mallow

Related questions

Recent Activity

Donate For Us