I am running a Spark Cluster and when I'm executing the below command on Databricks Notebook, it gives me the output:
dbutils.fs.ls("/mnt/test_file.json")
[FileInfo(path=u'dbfs:/mnt/test_file.json', name=u'test_file.json', size=1083L)]
However, when I'm trying to read that file, I'm getting the below mentioned error:
with open("mnt/test_file.json", 'r') as f:
for line in f:
print line
IOError: [Errno 2] No such file or directory: 'mnt/test_file.json'
What might be the issue here? Any help/support is greatly appreciated.
ls command (dbutils.fs.ls) Lists the contents of a directory. To display help for this command, run dbutils. fs. help("ls") .
You can access the file system using magic commands such as %fs or %sh . You can also use the Databricks file system utility (dbutils. fs). Databricks uses a FUSE mount to provide local access to files stored in the cloud.
If you use the Databricks Connect client library you can read local files into memory on a remote Databricks Spark cluster. See details here. The alternative is to use the Databricks CLI (or REST API) and push local data to a location on DBFS, where it can be read into Spark from within a Databricks notebook.
In order to access files on a DBFS mount using local file APIs you need to prepend /dbfs
to the path, so in your case it should be
with open('/dbfs/mnt/test_file.json', 'r') as f:
for line in f:
print(line)
See more details in the docs at https://docs.databricks.com/data/databricks-file-system.html#local-file-apis especially regarding limitations. With Databricks Runtime 5.5 and below there's a 2GB file limit. With 6.0+ there's no longer such a limit as the FUSE mount has been optimized to deal with larger file sizes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With