I think I am seeing a bug in spark where mode 'overwrite' is not respected, rather an exception is thrown on an attempt to do saveAsTable into a table that already exists (using mode 'overwrite').
Below is a little scriptlet that reproduces the issue. The last statement results in a stack trace reading:
org.apache.spark.sql.AnalysisException: Table `example` already exists.;
Any advice much appreciated.
spark.sql("drop table if exists example ").show()
case class Person(first: String, last: String, age: Integer)
val df = List(
Person("joe", "x", 9),
Person("fred", "z", 9)).toDF()
df.write.option("mode","overwrite").saveAsTable("example")
val recover1 = spark.read.table("example")
recover1.show()
val df3 = List(
Person("mouse", "x", 9),
Person("golf", "z", 9)).toDF()
df3.write.
option("mode","overwrite").saveAsTable("example")
val recover4 = spark.read.table("example")
recover4.show()
Overwrite mode means that when saving a DataFrame to a data source, if data/table already exists, existing data is expected to be overwritten by the contents of the DataFrame.
In Spark/PySpark, you can use show() action to get the top/first N (5,10,100 ..) rows of the DataFrame and display them on a console or a log, there are also several Spark Actions like take() , tail() , collect() , head() , first() that return top and last n rows as a list of Rows (Array[Row] for Scala).
PySpark SQL is a module in Spark which integrates relational processing with Spark's functional programming API. We can extract the data by using an SQL query language. We can use the queries same as the SQL language.
saveAsTable
doesn't check extra options, use mode
directly
df3.write.mode(SaveMode.Overwrite).saveAsTable("example")
or
df3.write.mode("overwrite").saveAsTable("example")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With