ON DUPLICATE KEY UPDATE while inserting from pyspark dataframe to an external database table via JDBC

Question

Well, I'm using PySpark and I have a Spark dataframe using which I insert the data into a mysql table.

url = "jdbc:mysql://hostname/myDB?user=xyz&password=pwd"

df.write.jdbc(url=url, table="myTable", mode="append")

I want to update a column value (which is not in primary key) by the sum of its column value and a specific number.

I've tried with different modes (append, overwrite) DataFrameWriter.jdbc() function.

My question is how do we update a column value as in we do it with ON DUPLICATE KEY UPDATE in mysql, while inserting the pyspark dataframe data into a table.

ThatDataGuy · Accepted Answer

A workaround is to insert the data into a staging table, and then migrate it into the final tables using a SQL statement executed by the driver program. Than you can use any valid SQL syntax relevant to your database provider.

ON DUPLICATE KEY UPDATE while inserting from pyspark dataframe to an external database table via JDBC

Tags:

apache-spark

apache-spark-sql

pyspark

pyspark-sql

spark-dataframe

Richie

1 Answers

ThatDataGuy

Recent Activity

Donate For Us

ON DUPLICATE KEY UPDATE while inserting from pyspark dataframe to an external database table via JDBC

Tags:

apache-spark

apache-spark-sql

pyspark

pyspark-sql

spark-dataframe

Richie

1 Answers

ThatDataGuy

Related questions

Recent Activity

Donate For Us