Performance Issue with writing Spark Dataframes to Oracle Database

Question

I am trying to write to save a Spark DataFrame to Oracle. The save is working but the performance seems to be very poor.

I have tried 2 approaches using

dfToSave.write().mode(SaveMode.Append).jdbc(…) -- I suppose this uses below API internally.
JdbcUtils.saveTable(dfToSave,ORACLE_CONNECTION_URL, "table",props)

Both seem to be taking very long, more than 3 mins for size of 400/500 rows DataFrame.

I hit across a JIRA SPARK-10040 , but says it is resolved in 1.6.0 and I am using the same.

Anyone has faced the issue and knows how to resolve it?

Ion Freeman · Accepted Answer

I can tell you what happened to me. I turned down my partitions to query the database, and so my previously performant processing (PPP) became quite slow. However, since my dataset only collects when I post it back to the database, I (like you) thought there was a problem with the spark API, driver, connection, table structure, server configuration, anything. But, no, you just have to repartition after your query.

Performance Issue with writing Spark Dataframes to Oracle Database

Tags:

apache-spark-sql

Nitin Kumar

1 Answers

Ion Freeman

Recent Activity

Donate For Us

Performance Issue with writing Spark Dataframes to Oracle Database

Tags:

apache-spark-sql

Nitin Kumar

1 Answers

Ion Freeman

Related questions

Recent Activity

Donate For Us