I am new to R but am interested in using Shiny to create dynamic charts using data stored in a SQL Server database. To enable interactivity, I want to bring in the raw data from the database and perform calculations within R
rather than have the database summarize the data.
I am able to connect to the database using RODBC
, execute a query, and receive results in a data.frame
. However, the read time in R
is about 12x longer than than the same query executed in SQL Server Management Studio (SSMS). SSMS takes ~600 ms, whereas R
takes about 7.6 seconds. My question is whether I am doing something wrong, or is R
just really slow with database access? And if so, are there faster alternatives (e.g. writing the database output to a file and reading the file)?
Some information about the query that may help: The query retrieves about 250K rows with 4 columns. The first column is a date and the other three are numeric values. The machine running R
and SSMS is a high-end Win 7 workstation with 32GB of memory. The R
command that I am running is:
system.time(df <- sqlQuery(cn, query))
which returns:
user system elapsed
7.17 0.01 7.58
Interestingly, it appears that the data transfer from SQL to my machine is fast, but that R
is busy doing things internally for several seconds before returning the data.frame
. I see this because network utilization spikes in the first second and almost immediately returns to near 0. Then several seconds later, the R
data.frame
returns.
I would try RJDBC http://cran.r-project.org/web/packages/RJDBC/RJDBC.pdf
with these drivers https://msdn.microsoft.com/en-us/sqlserver/aa937724.aspx
library(RJDBC)
drv <- JDBC("com.microsoft.sqlserver.jdbc.SQLServerDriver","/sqljdbc4.jar")
con <- dbConnect(drv, "jdbc:sqlserver://server.location", "username", "password")
dbGetQuery(con, "select column_name from table")
I would make sure that your R timezone - Sys.setenv(TZ='GMT')
set to GMT for example - is same as the time zone of the SQL server from where you are pulling data. It could be that the date column is taking a long time to be interpreted especially if it has a timestamp.
RJDBC will run quicker because it converts date to character and everything else to numeric. RODBC will try to preserve the data type of the SQL table.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With