I am currently listing files in Azure Datalake Store gen1
successfully with the following command:
dbutils.fs.ls('mnt/dbfolder1/projects/clients')
The structure of this folder is
- client_comp_automotive_1.json [File]
- client_comp_automotive_2.json [File]
- client_comp_automotive_3.json [File]
- client_comp_automotive_4.json [File]
- PROCESSED [Folder]
I want to loop through those (.json
) files in this folder and process them one by one, so that I can act on error or something else and move successfully processed file to a subfolder.
How can I do this in python
. I have tried
folder = dbutils.fs.ls('mnt/dbfolder1/projects/clients')
files = [f for f in os.listdir(folder) if os.path.isfile(f)]
But this does not work. os
is unknown. How can I do this within Databricks
?
The answer was simple even when i searched for two days:
files = dbutils.fs.ls('mnt/dbfolder1/projects/clients')
for fi in files:
print(fi.path)
Scala version of the same (with ADLS path)
val dirList = dbutils.fs.ls("abfss://<container>@<storage_account>.dfs.core.windows.net/<DIR_PATH>/")
// option1
dirList.foreach(println)
// option2
for (dir <- dirList) println(dir.name)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With