Spark and MSSQL insert speed

Question

I've encountered an issue where it takes spark about one hour to insert hundreds of thousands records into MSSQL database using JDBC driver.

Spark version: 2.2.0
MSSQL JDBC Driver version: 6.1.0.jre8

Looking at profile I noticed that Spark (or most likely JDBC driver) is generating a separate insert for each row in my DataFrame and that's of course is slow.

I looked at JDBC configuration and did not find a way to enable batched inserts.

Is there a way to config Spark application so that it inserts data using BULK INSERT or generates big batches?

MxR · Accepted Answer

Microsoft released special Spark connector for Azure SQL Database to provide this functionality that also works for normal MsSql database. You can see bulk insert example on their GitHub page https://github.com/Azure/azure-sqldb-spark#bulk-copy-to-azure-sql-database-or-sql-server

Spark and MSSQL insert speed

Tags:

sql-server

scala

jdbc

apache-spark

Evaldas Buinauskas

1 Answers

MxR

Recent Activity

Donate For Us

Spark and MSSQL insert speed

Tags:

sql-server

scala

jdbc

apache-spark

Evaldas Buinauskas

1 Answers

MxR

Related questions

Recent Activity

Donate For Us