Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NameError: name 'dbutils' is not defined in pyspark

I am running a pyspark job in databricks cloud. I need to write some of the csv files to databricks filesystem (dbfs) as part of this job and also i need to use some of the dbutils native commands like,

#mount azure blob to dbfs location
dbutils.fs.mount (source="...",mount_point="/mnt/...",extra_configs="{key:value}")

I am also trying to unmount once the files has been written to the mount directory. But, when i am using dbutils directly in the pyspark job it is failing with

NameError: name 'dbutils' is not defined

Should i import any of the package to use dbutils in pyspark code ? Thanks in advance.

like image 920
Krishna Reddy Avatar asked Jun 12 '18 09:06

Krishna Reddy


People also ask

What is Dbutils in Pyspark?

October 21, 2022. Databricks Utilities ( dbutils ) make it easy to perform powerful combinations of tasks. You can use the utilities to work with object storage efficiently, to chain and parameterize notebooks, and to work with secrets. dbutils are not supported outside of notebooks.

What does Dbutils fs ls return?

Utility can list all the folders/files within a specific mount point. For instance, in the example below, using “dbutils.fs.ls(“/mnt/location”)” prints out all the directories within that mount point location.

What is Dbutils FS CP?

cp command (dbutils.Copies a file or directory, possibly across filesystems. To display help for this command, run dbutils. fs. help("cp") .

What is %FS in Databricks?

In this article The Databricks File System (DBFS) is a distributed file system mounted into an Azure Databricks workspace and available on Azure Databricks clusters. DBFS is an abstraction on top of scalable object storage that maps Unix-like filesystem calls to native cloud storage API calls.


1 Answers

Try to use this:

def get_dbutils(spark):
        try:
            from pyspark.dbutils import DBUtils
            dbutils = DBUtils(spark)
        except ImportError:
            import IPython
            dbutils = IPython.get_ipython().user_ns["dbutils"]
        return dbutils

dbutils = get_dbutils(spark)
like image 145
Elisabetta Avatar answered Sep 28 '22 01:09

Elisabetta