Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"The associated location already exists" when saving a Spark DataFrame with mode('overwrite') set

With mode('overwrite') set during a saveAsTable() operation:


df1.write.format('parquet').mode('overwrite').saveAsTable(
    'spark_no_bucket_table1')

Then why does saving a table fail?

pyspark.sql.utils.AnalysisException: Can not create the managed 
      table('`spark_no_bucket_table1`'). 
The associated location('file:experiments/spark-warehouse/spark_no_bucket_table1') 
   already exists.
like image 495
WestCoastProjects Avatar asked Oct 20 '25 11:10

WestCoastProjects


1 Answers

From Spark's 2.4.0 migration guide:

Since Spark 2.4, creating a managed table with nonempty location is not allowed. An exception is thrown when attempting to create a managed table with nonempty location. To set true to spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation restores the previous behavior. This option will be removed in Spark 3.0.

So if you use Spark in version >= 2.4.0 and < 3.0.0, you can solve it by setting:

spark.conf.set("spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation","true")

For Spark version > 3.0.0, you will have to manually clean up the data directory specified in the error message.

like image 183
Gabio Avatar answered Oct 23 '25 01:10

Gabio