Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Azure Databricks - Can not create the managed table The associated location already exists

I have the following problem in Azure Databricks. Sometimes when I try to save a DataFrame as a managed table:

SomeData_df.write.mode('overwrite').saveAsTable("SomeData")

I get the following error:

"Can not create the managed table('SomeData'). The associated location('dbfs:/user/hive/warehouse/somedata') already exists.;"

I used to fix this problem by running a %fs rm command to remove that location but now I'm using a cluster that is managed by a different user and I can no longer run rm on that location.

For now the only fix I can think of is using a different table name.

What makes things even more peculiar is the fact that the table does not exist. When I run:

%sql
SELECT * FROM SomeData

I get the error:

Error in SQL statement: AnalysisException: Table or view not found: SomeData;

How can I fix it?

like image 770
BuahahaXD Avatar asked Mar 27 '19 15:03

BuahahaXD


2 Answers

Seems there are a few others with the same issue.

A temporary workaround is to use

dbutils.fs.rm("dbfs:/user/hive/warehouse/SomeData/", true)

to remove the table before re-creating it.

like image 199
char Avatar answered Oct 16 '22 06:10

char


This generally happens when a cluster is shutdown while writing a table. The recomended solution from Databricks documentation:

This flag deletes the _STARTED directory and returns the process to the original state. For example, you can set it in the notebook

%py
spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")
like image 20
Mike Avatar answered Oct 16 '22 04:10

Mike