Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance Issue with writing Spark Dataframes to Oracle Database

I am trying to write to save a Spark DataFrame to Oracle. The save is working but the performance seems to be very poor.

I have tried 2 approaches using

  1. dfToSave.write().mode(SaveMode.Append).jdbc(…) -- I suppose this uses below API internally.
  2. JdbcUtils.saveTable(dfToSave,ORACLE_CONNECTION_URL, "table",props)

Both seem to be taking very long, more than 3 mins for size of 400/500 rows DataFrame.

I hit across a JIRA SPARK-10040 , but says it is resolved in 1.6.0 and I am using the same.

Anyone has faced the issue and knows how to resolve it?

like image 568
Nitin Kumar Avatar asked Mar 06 '26 11:03

Nitin Kumar


1 Answers

I can tell you what happened to me. I turned down my partitions to query the database, and so my previously performant processing (PPP) became quite slow. However, since my dataset only collects when I post it back to the database, I (like you) thought there was a problem with the spark API, driver, connection, table structure, server configuration, anything. But, no, you just have to repartition after your query.

like image 132
Ion Freeman Avatar answered Mar 11 '26 05:03

Ion Freeman



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!