Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

spark [dataframe].write.option("mode","overwrite").saveAsTable("foo") fails with 'already exists' if foo exists

I think I am seeing a bug in spark where mode 'overwrite' is not respected, rather an exception is thrown on an attempt to do saveAsTable into a table that already exists (using mode 'overwrite').

Below is a little scriptlet that reproduces the issue. The last statement results in a stack trace reading:

 org.apache.spark.sql.AnalysisException: Table `example` already exists.;

Any advice much appreciated.

spark.sql("drop table if exists example ").show()
case class Person(first: String, last: String, age: Integer)
val df = List(
    Person("joe", "x", 9),
    Person("fred", "z", 9)).toDF()
df.write.option("mode","overwrite").saveAsTable("example")

val recover1 = spark.read.table("example")
recover1.show()


val df3 = List(
    Person("mouse", "x", 9),
    Person("golf", "z", 9)).toDF()

 df3.write.
    option("mode","overwrite").saveAsTable("example")      

val recover4 = spark.read.table("example")
recover4.show()     
like image 359
Chris Bedford Avatar asked Aug 06 '19 04:08

Chris Bedford


People also ask

How does Spark overwrite mode work?

Overwrite mode means that when saving a DataFrame to a data source, if data/table already exists, existing data is expected to be overwritten by the contents of the DataFrame.

How do you select top 10 rows in PySpark?

In Spark/PySpark, you can use show() action to get the top/first N (5,10,100 ..) rows of the DataFrame and display them on a console or a log, there are also several Spark Actions like take() , tail() , collect() , head() , first() that return top and last n rows as a list of Rows (Array[Row] for Scala).

What is %SQL in PySpark?

PySpark SQL is a module in Spark which integrates relational processing with Spark's functional programming API. We can extract the data by using an SQL query language. We can use the queries same as the SQL language.


1 Answers

saveAsTable doesn't check extra options, use mode directly

df3.write.mode(SaveMode.Overwrite).saveAsTable("example")

or

df3.write.mode("overwrite").saveAsTable("example")
like image 89
Gelerion Avatar answered Sep 19 '22 21:09

Gelerion