Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I get "BULK INSERT"-like speeds when inserting from Java into SQL Server?

During my quest to find the fastest method to get data from Java to SQL Server, I have noticed that the fastest Java-method I can come up with, is still 12 times slower than using BULK INSERT.

My data is being generated from within Java, and BULK INSERT only supports reading data from a text file, so using BULK INSERT is not an option unless I output my data to a temporary text file. This in turn, would of course be a huge performance hit.

When inserting from Java, insert speeds are around 2500 rows per second. Even when I measure the time after the for loop, and just before the executeBatch. So "creating" the data in-memory is not the bottleneck.

When inserting with BATCH INSERT, insert speeds are around 30000 rows per second.

Both tests have been done on the server. So network is also not a bottleneck. Any clue as to why BATCH INSERT is faster? And, if the same performance can be attained from within Java?

This is just a big dataset that needs to get loaded once. So it would be OK to temporary disable any kind of logging (already tried simple logging), disable indexes (table has none), locking, whatever, ...

My test-setup so far

Database:

CREATE TABLE TestTable   
   (  Col1 varchar(50)
    , Col2 int);  

Java:

// This seems to be essential to get good speeds, otherwise batching is not used.
conn.setAutoCommit(false);

PreparedStatement prepStmt = conn.prepareStatement("INSERT INTO TestTable (Col1, Col2) VALUES (?, ?)");
for (int i = 1; i <= 10000; i++) {
    prepStmt.setString(1,"X");            
    prepStmt.setInt(2,100);
    prepStmt.addBatch();
}
prepStmt.executeBatch();
conn.commit();

BULK INSERT:

// A text file containing "X 100" over and over again... so the same data as generated in JAVA
bulk insert TestTable FROM 'c:\test\test.txt';
like image 526
Wouter Avatar asked Nov 07 '16 17:11

Wouter


People also ask

How can I speed up bulk insert in SQL?

Below are some good ways to improve BULK INSERT operations : Using TABLOCK as query hint. Dropping Indexes during Bulk Load operation and then once it is completed then recreating them. Changing the Recovery model of database to be BULK_LOGGED during the load operation.

Why bulk insert is faster than insert?

¶ Both 'Bulk insert with batch size' and 'Use single record insert' options are used for inserting records in a database table. The 'Bulk insert with batch size' option is used when you want the whole dataset to be loaded in batches of a specified size. Typically, larger batch sizes result in better transfer speeds.

How can increase insert query performance in SQL Server?

We should reduce the number of columns in tables. That means that when more rows can fit on a single data page then that helps boost SQL Server read performance.

How bulk insert works in SQL Server?

BULK INSERT loads data from a data file into a table. This functionality is similar to that provided by the in option of the bcp command; however, the data file is read by the SQL Server process. For a description of the BULK INSERT syntax, see BULK INSERT (Transact-SQL).


1 Answers

While BULK INSERT is the fastest way of doing bulk insert, SQL Server supports remote (client-driven) bulk insert operations both through the native driver and ODBC. From version 4.2 onwards of the JDBC driver, this functionality is exposed through the SQLServerBulkCopy class, which does not directly read from files but does support reading from a RowSet, ResultSet or a custom implementation of ISQLServerBulkRecord for generated data. This functionality is equivalent to the .NET SqlBulkCopy class, with largely the same interface, and should be the fastest way of performing bulk operations short of a server-based BULK INSERT.

EDIT: Example by OP

Below you can find an example use-case that could be used to test the performance of SQLServerBulkCSVFileRecord, a method similar to SQLServerBulkCopy except that it reads from a text file. In my test case, test.txt contained a million rows with "X tab 100"

CREATE TABLE TestTable (Col1 varchar(50), Col2 int);

The table should not have any indexes enabled.

In JAVA

// Make sure to use version 4.2, as SQLServerBulkCSVFileRecord is not included in version 4.1
import com.microsoft.sqlserver.jdbc.*;

long startTime = System.currentTimeMillis();
SQLServerBulkCSVFileRecord fileRecord = null;  

fileRecord = new SQLServerBulkCSVFileRecord("C:\\temp\\test.txt", true);   
fileRecord.addColumnMetadata(1, null, java.sql.Types.NVARCHAR, 50, 0);  
fileRecord.addColumnMetadata(2, null, java.sql.Types.INTEGER, 0, 0);  
Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver");  
Connection destinationConnection = DriverManager.getConnection("jdbc:sqlserver://Server\\\\Instance:1433", "user", "pass");
SQLServerBulkCopyOptions copyOptions = new SQLServerBulkCopyOptions();  

// Depending on the size of the data being uploaded, and the amount of RAM, an optimum can be found here. Play around with this to improve performance.
copyOptions.setBatchSize(300000); 

// This is crucial to get good performance
copyOptions.setTableLock(true);  

SQLServerBulkCopy bulkCopy =  new SQLServerBulkCopy(destinationConnection);
bulkCopy.setBulkCopyOptions(copyOptions);  
bulkCopy.setDestinationTableName("TestTable");
bulkCopy.writeToServer(fileRecord);

long endTime   = System.currentTimeMillis();
long totalTime = endTime - startTime;
System.out.println(totalTime + "ms");

Using this example, I was able to get insert speeds of up to 30000 rows per second.

like image 116
Jeroen Mostert Avatar answered Sep 24 '22 08:09

Jeroen Mostert