I am trying to write a large dataset (10 cols, 100M records) from R to SAP HANA using RJDBC's dbWritetable in the following way
library("RJDBC")
drv <- JDBC("com.sap.db.jdbc.Driver", "/data/hdbclient/ngdbc.jar", "'")
database <- dbConnect( drv,"jdbc:sap://servername", "USER", "PASS")
dbWriteTable(database, "largeSet", largeSet)
This works, but is extremely slow (75k records per HOUR). I have tested RODBC (sqlsave
) as well and this shows the same issue.
Looking at the code behind dbWriteTable
it seems that writing is record by record (i.e. the same as insert into) and indeed using a line by line insert into using dbSendUpdate
shows the same performance. I have verified that the problem is not in the connection speed itself.
ROracle has a bulk_write
option which seems to solve this issue, but since I am trying to write to HANA I need RJDBC or RODBC.
Can anyone tell me how I can speed up the write to HANA by running a bulk write or some other method?
If your main goal is a speed-up, without changing too much else, you could switch to the sjdbc
package, which is a lot more performant in this respect than RJDBC
(which sadly didn't get much attention in recent years).
While I write this and check back on CRAN, it looks like Simon has just recently found back to it and published a new release only a week ago. This does in fact include an improvement to dbSendUpdate
:
https://cran.r-project.org/web/packages/RJDBC/NEWS
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With