Spark write data by SaveMode as Append or overwrite

Question

As per my analysis, append will re-add the data, even though its available in the table, whereas overwrite Savemode will update existing date if any and will add addition row in the data frame.

val secondCompaniesDF = Seq((100, "comp1"), (101, "comp2"),(103,"comp2"))
    .toDF("companyid","name")
  secondCompaniesDF.write.mode(SaveMode.Overwrite)
    .option("createTableColumnTypes","companyid int , name varchar(100)")
    .jdbc(url, "Company", connectionProperties)

If SaveMode is Append, and this program is re-executed company will have 3 rows, whereas in case of Overwrite, if re-execute with any changes or addition row, existing records will be updated and new row will be added

Note: Overwrite drops the table and re-create the table. Is there any way where existing record get updated and new record get inserted something like upsert.

Shubham Jain · Accepted Answer

For upsert and merge you can use delta lake by databricks or HUDI

Here are the links

https://github.com/apache/hudi

https://docs.databricks.com/delta/delta-intro.html

Spark write data by SaveMode as Append or overwrite

Tags:

scala

apache-spark

apache-spark-sql

SushantPatade

1 Answers

Shubham Jain

Recent Activity

Donate For Us

Spark write data by SaveMode as Append or overwrite

Tags:

scala

apache-spark

apache-spark-sql

SushantPatade

1 Answers

Shubham Jain

Related questions

Recent Activity

Donate For Us