As per my analysis, append will re-add the data, even though its available in the table, whereas overwrite Savemode will update existing date if any and will add addition row in the data frame.
val secondCompaniesDF = Seq((100, "comp1"), (101, "comp2"),(103,"comp2"))
.toDF("companyid","name")
secondCompaniesDF.write.mode(SaveMode.Overwrite)
.option("createTableColumnTypes","companyid int , name varchar(100)")
.jdbc(url, "Company", connectionProperties)
If SaveMode is Append, and this program is re-executed company will have 3 rows, whereas in case of Overwrite, if re-execute with any changes or addition row, existing records will be updated and new row will be added
Note: Overwrite drops the table and re-create the table. Is there any way where existing record get updated and new record get inserted something like upsert.
For upsert and merge you can use delta lake by databricks or HUDI
Here are the links
https://github.com/apache/hudi
https://docs.databricks.com/delta/delta-intro.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With