I'm wondering if anyone knows of a way to import data from a "big" xlsx file (~20Mb). I tried to use xlsx and XLConnect libraries. Unfortunately, both use rJava and I always obtain the same error:
> library(XLConnect) > wb <- loadWorkbook("MyBigFile.xlsx") Error: OutOfMemoryError (Java): Java heap space
or
> library(xlsx) > mydata <- read.xlsx2(file="MyBigFile.xlsx") Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.OutOfMemoryError: Java heap space
I also tried to modify the java.parameters before loading rJava:
> options( java.parameters = "-Xmx2500m") > library(xlsx) # load rJava > mydata <- read.xlsx2(file="MyBigFile.xlsx") Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.OutOfMemoryError: Java heap space
or after loading rJava (this is a bit stupid, I think):
> library(xlsx) # load rJava > options( java.parameters = "-Xmx2500m") > mydata <- read.xlsx2(file="MyBigFile.xlsx") Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl, : java.lang.OutOfMemoryError: Java heap space
But nothing works. Does anyone have an idea?
Importing Excel files into R using readxl packageThe readxl package, developed by Hadley Wickham, can be used to easily import Excel files (xls|xlsx) into R without any external dependencies.
xlsx() and read. xlsx2() can be used to read the contents of an Excel worksheet into an R data.
I stumbled on this question when someone sent me (yet another) Excel file to analyze. This one isn't even that big but for whatever reason I was running into a similar error:
java.lang.OutOfMemoryError: GC overhead limit exceeded
Based on comment by @DirkEddelbuettel in a previous answer I installed the openxlsx package (http://cran.r-project.org/web/packages/openxlsx/). and then ran:
library("openxlsx") mydf <- read.xlsx("BigExcelFile.xlsx", sheet = 1, startRow = 2, colNames = TRUE)
It was just what I was looking for. Easy to use and wicked fast. It's my new BFF. Thanks for the tip @DirkEddelbuettel!
BTW, I don't want to poach this answer from Dirk E, so if he posts an answer, please accept it rather than mine!
options(java.parameters = "-Xmx2048m") ## memory set to 2 GB library(XLConnect)
allow for more memory using "options" before any java component is loaded. Then load XLConnect library (it uses java).
That's it. Start reading in data with readWorksheet .... and so on. :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With