How to overwrite data with PySpark's JDBC without losing schema?

Question

I have a DataFrame that I'm willing to write it to a PostgreSQL database. If I simply use the "overwrite" mode, like:

df.write.jdbc(url=DATABASE_URL, table=DATABASE_TABLE, mode="overwrite", properties=DATABASE_PROPERTIES)

The table is recreated and the data is saved. But the problem is that I'd like to keep the PRIMARY KEY and Indexes in the table. So, I'd like to either overwrite only the data, keeping the table schema or to add the primary key constraint and indexes afterward. Can either one be done with PySpark? Or do I need to connect to the PostgreSQL and execute the commands to add the indexes myself?

user3124181 · Accepted Answer

The default behavior for mode="overwrite" is to first delete the table, then recreate it with the new data. You can instead truncate the data by including option("truncate", "true") and then push your own:

df.write.option("truncate", "true").jdbc(url=DATABASE_URL, table=DATABASE_TABLE, mode="overwrite", properties=DATABASE_PROPERTIES)

This way, you are not recreating the table so it shouldn't make any modifications to your schema.

How to overwrite data with PySpark's JDBC without losing schema?

Tags:

apache-spark

apache-spark-sql

pyspark

Fernando Camargo

1 Answers

user3124181

Recent Activity

Donate For Us

How to overwrite data with PySpark's JDBC without losing schema?

Tags:

apache-spark

apache-spark-sql

pyspark

Fernando Camargo

1 Answers

user3124181

Related questions

Recent Activity

Donate For Us