Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ON DUPLICATE KEY UPDATE while inserting from pyspark dataframe to an external database table via JDBC

Well, I'm using PySpark and I have a Spark dataframe using which I insert the data into a mysql table.

url = "jdbc:mysql://hostname/myDB?user=xyz&password=pwd"

df.write.jdbc(url=url, table="myTable", mode="append")

I want to update a column value (which is not in primary key) by the sum of its column value and a specific number.

I've tried with different modes (append, overwrite) DataFrameWriter.jdbc() function.

My question is how do we update a column value as in we do it with ON DUPLICATE KEY UPDATE in mysql, while inserting the pyspark dataframe data into a table.

like image 872
Richie Avatar asked Sep 16 '15 11:09

Richie


1 Answers

A workaround is to insert the data into a staging table, and then migrate it into the final tables using a SQL statement executed by the driver program. Than you can use any valid SQL syntax relevant to your database provider.

like image 144
ThatDataGuy Avatar answered Oct 23 '22 21:10

ThatDataGuy