Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark and MSSQL insert speed

I've encountered an issue where it takes spark about one hour to insert hundreds of thousands records into MSSQL database using JDBC driver.

  • Spark version: 2.2.0
  • MSSQL JDBC Driver version: 6.1.0.jre8

Looking at profile I noticed that Spark (or most likely JDBC driver) is generating a separate insert for each row in my DataFrame and that's of course is slow.

I looked at JDBC configuration and did not find a way to enable batched inserts.

Is there a way to config Spark application so that it inserts data using BULK INSERT or generates big batches?

like image 291
Evaldas Buinauskas Avatar asked Nov 17 '25 06:11

Evaldas Buinauskas


1 Answers

Microsoft released special Spark connector for Azure SQL Database to provide this functionality that also works for normal MsSql database. You can see bulk insert example on their GitHub page https://github.com/Azure/azure-sqldb-spark#bulk-copy-to-azure-sql-database-or-sql-server

like image 176
MxR Avatar answered Nov 18 '25 19:11

MxR