NameError: name 'dbutils' is not defined in pyspark

Tags:

I am running a pyspark job in databricks cloud. I need to write some of the csv files to databricks filesystem (dbfs) as part of this job and also i need to use some of the dbutils native commands like,

#mount azure blob to dbfs location
dbutils.fs.mount (source="...",mount_point="/mnt/...",extra_configs="{key:value}")

I am also trying to unmount once the files has been written to the mount directory. But, when i am using dbutils directly in the pyspark job it is failing with

NameError: name 'dbutils' is not defined

Should i import any of the package to use dbutils in pyspark code ? Thanks in advance.

920

asked Jun 12 '18 09:06

Krishna Reddy

1 Answers

Try to use this:

def get_dbutils(spark):
        try:
            from pyspark.dbutils import DBUtils
            dbutils = DBUtils(spark)
        except ImportError:
            import IPython
            dbutils = IPython.get_ipython().user_ns["dbutils"]
        return dbutils

dbutils = get_dbutils(spark)

145

answered Sep 28 '22 01:09

Elisabetta

Related questions
                            
                                Writing a sparkdataframe to a .csv file in S3 and choose a name in pyspark
                            
                                Pyspark SQL Pandas Grouped Map without GroupBy?
                            
                                Getting OutofMemoryError- GC overhead limit exceed in pyspark
                            
                                iterate over pyspark dataframe columns
                            
                                How to use a subquery for dbtable option in jdbc data source?
                            
                                Fill Pyspark dataframe column null values with average value from same column
                            
                                pyspark: counter part of like() method in dataframe
                            
                                Pyspark dataframe: Summing over a column while grouping over another
                            
                                How to load CSV file with records on multiple lines?
                            
                                Pyspark dataframe how to drop rows with nulls in all columns?
                            
                                Spark Dataframe column with last character of other column
                            
                                Can Spark Replace ETL Tool
                            
                                What does df.repartition with no column arguments partition on?
                            
                                spark filter (delete) rows based on values from another dataframe [duplicate]
                            
                                How to calculate rolling median in PySpark using Window()?
                            
                                What does "Correlated scalar subqueries must be Aggregated" mean?
                            
                                Using a column value as a parameter to a spark DataFrame function
                            
                                More than one hour to execute pyspark.sql.DataFrame.take(4)
                            
                                Get Last Monday in Spark
                            
                                How to skip lines while reading a CSV file as a dataFrame using PySpark?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

NameError: name 'dbutils' is not defined in pyspark

Tags:

pyspark-sql

azure-blob-storage

databricks

Krishna Reddy

People also ask

1 Answers

Elisabetta

Recent Activity

Donate For Us