I am reading CSV files from datalake store, for that I am having multiple paths but if any one path does not exist it gives exception. I want to avoid this expection.
I think if you want to check for multiple pathes, the check will fail if one path does not exist. Perhaps you could try a different approach.
For the given example if you want to subselect subfolders you could try the following instead.
Read sub-directories of a given directory:
# list all subfolders and files in directory demo
dir = dbutils.fs.ls ("/mnt/adls2/demo")
Filter out the relevant sub-directories:
pathes = ''
for i in range (0, len(dir)):
subpath = dir[i].path
if '/corr' in subpath or '/deci' in subpath and subpath.startswith ('dbfs:/'): # select dirs to read
pathes = pathes + (dir[i].path) + ' '
# convert the string to a list
pathes = list(pathes.split())
Use the result-list to read the dataframe:
df = (spark.read
.json(pathes))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With