mount error when trying to access the Azure DBFS file system in Azure Databricks

Question

I'm able to establish a connection to my Databricks FileStore DBFS and access the filestore.

Reading, writing, and transforming data with Pyspark is possible but when I try to use a local Python API such as pathlib or the OS module I am unable to get past the first level of the DBFS file system

I can use a magic command:

%fs ls dbfs:\mnt\my_fs\... which works perfectly and lists all the child directories?

but if I do os.listdir('\dbfs\mnt\my_fs\') it returns ['mount.err'] as a return value

I've tested this on a new cluster and the result is the same

I'm using Python on a Databricks Runtine Version 6.1 with Apache Spark 2.4.4

is anyone able to advise.

Edit :

Connection Script :

I've used the Databricks CLI library to store my credentials which are formatted according to the databricks documentation:

 def initialise_connection(secrets_func):
  configs = secrets_func()
  # Check if the mount exists
  bMountExists = False
  for item in dbutils.fs.ls("/mnt/"):
      if str(item.name) == r"WFM/":
          bMountExists = True
      # drop if exists to refresh credentials
      if bMountExists:
        dbutils.fs.unmount("/mnt/WFM")
        bMountExists = False

      # Mount a drive
      if not (bMountExists):
          dbutils.fs.mount(
              source="adl://test.azuredatalakestore.net/WFM",
              mount_point="/mnt/WFM",
              extra_configs=configs
          )
          print("Drive mounted")
      else:
          print("Drive already mounted")

danialk · Accepted Answer

We experienced this issue when the same container was mounted to two different paths in the workspace. Unmounting all and remounting resolved our issue. We were using Databricks version 6.2 (Spark 2.4.4, Scala 2.11). Our blob store container config:

Performance/Access tier: Standard/Hot
Replication: Read-access geo-redundant storage (RA-GRS)
Account kind: StorageV2 (general purpose v2)

Notebook script to run to unmount all mounts in /mnt:

# Iterate through all mounts and unmount 
print('Unmounting all mounts beginning with /mnt/')
dbutils.fs.mounts()
for mount in dbutils.fs.mounts():
  if mount.mountPoint.startswith('/mnt/'):
    dbutils.fs.unmount(mount.mountPoint)

# Re-list all mount points
print('Re-listing all mounts')
dbutils.fs.mounts()

Minimal job to test on automated job cluster

Assuming you have a separate process to create the mounts. Create job definition (job.json) to run Python script on automated cluster:

{
  "name": "Minimal Job",
  "new_cluster": {
    "spark_version": "6.2.x-scala2.11",
    "spark_conf": {},
    "node_type_id": "Standard_F8s",
    "driver_node_type_id": "Standard_F8s",
    "num_workers": 2,
    "enable_elastic_disk": true,
    "spark_env_vars": {
      "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
    }
  },
  "timeout_seconds": 14400,
  "max_retries": 0,
  "spark_python_task": {
    "python_file": "dbfs:/minimal/job.py"
  }
}

Python file (job.py) to print out mounts:

import os

path_mounts = '/dbfs/mnt/'
print(f"Listing contents of {path_mounts}:")
print(os.listdir(path_mounts))

path_mount = path_mounts + 'YOURCONTAINERNAME'
print(f"Listing contents of {path_mount }:")
print(os.listdir(path_mount))

Run databricks CLI commands to run job. View Spark Driver logs for output, confirming that mount.err does not exist.

databricks fs mkdirs dbfs:/minimal
databricks fs cp job.py dbfs:/minimal/job.py --overwrite
databricks jobs create --json-file job.json
databricks jobs run-now --job-id <JOBID FROM LAST COMMAND>

bramb · Answer

We have experienced the same issue when connecting to the an Azure Generation2 storage account (without hierarchical name spaces).

The error seems to occur when switching the Databricks Runtime Environment from 5.5 to 6.x. However, we have not been able to pinpoint the exact reason for this. We assume some functionality might have been deprecated.

mount error when trying to access the Azure DBFS file system in Azure Databricks

Tags:

python

azure

databricks

azure-databricks

Edit :

Umar.H

2 Answers

Minimal job to test on automated job cluster

danialk

bramb

Recent Activity

Donate For Us

mount error when trying to access the Azure DBFS file system in Azure Databricks

Tags:

python

azure

databricks

azure-databricks

Edit :

Umar.H

2 Answers

Minimal job to test on automated job cluster

danialk

bramb

Related questions

Recent Activity

Donate For Us