Using python/dbutils, how to display the files of the current directory & subdirectory recursively in Databricks file system(DBFS).
You can access the file system using magic commands such as %fs or %sh . You can also use the Databricks file system utility (dbutils. fs). Azure Databricks uses a FUSE mount to provide local access to files stored in the cloud.
When you use certain features, Azure Databricks puts files in the following folders under FileStore: /FileStore/jars - contains libraries that you upload. If you delete files in this folder, libraries that reference these files in your workspace may no longer work.
Surprising thing about dbutils.fs.ls (and %fs magic command) is that it doesn't seem to support any recursive switch. However, since ls function returns a list of FileInfo objects it's quite trivial to recursively iterate over them to get the whole content, e.g.:
def get_dir_content(ls_path):
dir_paths = dbutils.fs.ls(ls_path)
subdir_paths = [get_dir_content(p.path) for p in dir_paths if p.isDir() and p.path != ls_path]
flat_subdir_paths = [p for subdir in subdir_paths for p in subdir]
return list(map(lambda p: p.path, dir_paths)) + flat_subdir_paths
paths = get_dir_content('/databricks-datasets/COVID/CORD-19/2020-03-13')
[print(p) for p in paths]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With