I've encountered an issue where it takes spark about one hour to insert hundreds of thousands records into MSSQL database using JDBC driver.
Looking at profile I noticed that Spark (or most likely JDBC driver) is generating a separate insert for each row in my DataFrame and that's of course is slow.
I looked at JDBC configuration and did not find a way to enable batched inserts.
Is there a way to config Spark application so that it inserts data using BULK INSERT or generates big batches?
Microsoft released special Spark connector for Azure SQL Database to provide this functionality that also works for normal MsSql database. You can see bulk insert example on their GitHub page https://github.com/Azure/azure-sqldb-spark#bulk-copy-to-azure-sql-database-or-sql-server
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With