Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark write data by SaveMode as Append or overwrite

As per my analysis, append will re-add the data, even though its available in the table, whereas overwrite Savemode will update existing date if any and will add addition row in the data frame.

val secondCompaniesDF = Seq((100, "comp1"), (101, "comp2"),(103,"comp2"))
    .toDF("companyid","name")
  secondCompaniesDF.write.mode(SaveMode.Overwrite)
    .option("createTableColumnTypes","companyid int , name varchar(100)")
    .jdbc(url, "Company", connectionProperties)

If SaveMode is Append, and this program is re-executed company will have 3 rows, whereas in case of Overwrite, if re-execute with any changes or addition row, existing records will be updated and new row will be added

Note: Overwrite drops the table and re-create the table. Is there any way where existing record get updated and new record get inserted something like upsert.

like image 832
SushantPatade Avatar asked Oct 28 '25 06:10

SushantPatade


1 Answers

For upsert and merge you can use delta lake by databricks or HUDI

Here are the links

https://github.com/apache/hudi

https://docs.databricks.com/delta/delta-intro.html

like image 130
Shubham Jain Avatar answered Oct 29 '25 20:10

Shubham Jain