Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Write to SAP HANA with RJDBC using dbWritetable very slow due to record by record insert

Tags:

r

hana

rjdbc

I am trying to write a large dataset (10 cols, 100M records) from R to SAP HANA using RJDBC's dbWritetable in the following way

library("RJDBC")
drv <- JDBC("com.sap.db.jdbc.Driver", "/data/hdbclient/ngdbc.jar", "'")
database <- dbConnect( drv,"jdbc:sap://servername", "USER", "PASS")

dbWriteTable(database, "largeSet", largeSet)

This works, but is extremely slow (75k records per HOUR). I have tested RODBC (sqlsave) as well and this shows the same issue.

Looking at the code behind dbWriteTable it seems that writing is record by record (i.e. the same as insert into) and indeed using a line by line insert into using dbSendUpdate shows the same performance. I have verified that the problem is not in the connection speed itself.

ROracle has a bulk_write option which seems to solve this issue, but since I am trying to write to HANA I need RJDBC or RODBC.

Can anyone tell me how I can speed up the write to HANA by running a bulk write or some other method?

like image 394
Michiel Avatar asked Aug 17 '15 14:08

Michiel


1 Answers

If your main goal is a speed-up, without changing too much else, you could switch to the sjdbc package, which is a lot more performant in this respect than RJDBC (which sadly didn't get much attention in recent years).

While I write this and check back on CRAN, it looks like Simon has just recently found back to it and published a new release only a week ago. This does in fact include an improvement to dbSendUpdate:

https://cran.r-project.org/web/packages/RJDBC/NEWS

like image 155
RolandASc Avatar answered Nov 12 '22 18:11

RolandASc