I think I am seeing a bug in spark where mode 'overwrite' is not respected, rather an exception is thrown on an attempt to do saveAsTable into a table that already exists (using mode 'overwrite'). Below is a little scriptlet that reproduces the issue. The last statement results in a stack trace reading: <pre class="prettyprint"><code> org.apache.spark.sql.AnalysisException: Table `example` already exists.; </code></pre> Any advice much appreciated. <pre class="prettyprint"><code>spark.sql("drop table if exists example ").show() case class Person(first: String, last: String, age: Integer) val df = List( Person("joe", "x", 9), Person("fred", "z", 9)).toDF() df.write.option("mode","overwrite").saveAsTable("example") val recover1 = spark.read.table("example") recover1.show() val df3 = List( Person("mouse", "x", 9), Person("golf", "z", 9)).toDF() df3.write. option("mode","overwrite").saveAsTable("example") val recover4 = spark.read.table("example") recover4.show() </code></pre>

<code>saveAsTable</code> doesn't check extra options, use <code>mode</code> directly <pre class="prettyprint"><code>df3.write.mode(SaveMode.Overwrite).saveAsTable("example") </code></pre> or <pre class="prettyprint"><code>df3.write.mode("overwrite").saveAsTable("example") </code></pre>

spark [dataframe].write.option("mode","overwrite").saveAsTable("foo") fails with 'already exists' if foo exists

Tags:

overwrite

sql

scala

apache-spark

I think I am seeing a bug in spark where mode 'overwrite' is not respected, rather an exception is thrown on an attempt to do saveAsTable into a table that already exists (using mode 'overwrite').

Below is a little scriptlet that reproduces the issue. The last statement results in a stack trace reading:

 org.apache.spark.sql.AnalysisException: Table `example` already exists.;

Any advice much appreciated.

spark.sql("drop table if exists example ").show()
case class Person(first: String, last: String, age: Integer)
val df = List(
    Person("joe", "x", 9),
    Person("fred", "z", 9)).toDF()
df.write.option("mode","overwrite").saveAsTable("example")

val recover1 = spark.read.table("example")
recover1.show()


val df3 = List(
    Person("mouse", "x", 9),
    Person("golf", "z", 9)).toDF()

 df3.write.
    option("mode","overwrite").saveAsTable("example")      

val recover4 = spark.read.table("example")
recover4.show()

359

asked Aug 06 '19 04:08

Chris Bedford

1 Answers

saveAsTable doesn't check extra options, use mode directly

df3.write.mode(SaveMode.Overwrite).saveAsTable("example")

df3.write.mode("overwrite").saveAsTable("example")

answered Sep 19 '22 21:09

Gelerion

Related questions
                            
                                How to identify the operation type(insert,update,delete) in SQL Server trigger
                            
                                last quarter and next quarter
                            
                                ActiveAndroid Many-to-many relationship
                            
                                Oracle SQL CASE WHEN ORA-00932: inconsistent datatypes: expected CHAR got NUMBER 00932. 00000 - "inconsistent datatypes: expected %s got %s"
                            
                                How to prevent SQL Injection in Wordpress?
                            
                                Arithmetic overflow error converting IDENTITY to data type tinyint
                            
                                Two columns in subquery in where clause
                            
                                Writing Lengthy SQL queries in R
                            
                                Why can I not find a foreign key using the OBJECT_ID() function?
                            
                                How to send plain SQL queries (and retrieve results) using scala slick 3
                            
                                SSRS CountRows of a specific Field containing a specific Value
                            
                                How to check if a value in a numeric field is an integer?
                            
                                Why use IS DISTINCT FROM - Postgres
                            
                                How to find the record that violate unique key constraint?
                            
                                How to use LIKE and NOT LIKE together in a SQL Server query
                            
                                pandas to sql server
                            
                                querying ssisdb to find the name of packages
                            
                                Performance of query on indexed Boolean column vs Datetime column
                            
                                google bigquery select from a timestamp column between now and n days ago
                            
                                I/O error while reading input message; nested exception is java.io.IOException: Stream closed

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With