Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Importing a big xlsx file into R?

Tags:

r

excel

xlsx

I'm wondering if anyone knows of a way to import data from a "big" xlsx file (~20Mb). I tried to use xlsx and XLConnect libraries. Unfortunately, both use rJava and I always obtain the same error:

> library(XLConnect) > wb <- loadWorkbook("MyBigFile.xlsx") Error: OutOfMemoryError (Java): Java heap space 

or

> library(xlsx) > mydata <- read.xlsx2(file="MyBigFile.xlsx") Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  :     java.lang.OutOfMemoryError: Java heap space 

I also tried to modify the java.parameters before loading rJava:

> options( java.parameters = "-Xmx2500m") > library(xlsx) # load rJava > mydata <- read.xlsx2(file="MyBigFile.xlsx") Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  :     java.lang.OutOfMemoryError: Java heap space 

or after loading rJava (this is a bit stupid, I think):

> library(xlsx) # load rJava > options( java.parameters = "-Xmx2500m") > mydata <- read.xlsx2(file="MyBigFile.xlsx") Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  :     java.lang.OutOfMemoryError: Java heap space 

But nothing works. Does anyone have an idea?

like image 407
user2722443 Avatar asked Oct 02 '13 22:10

user2722443


People also ask

Can you upload Excel files to R?

Importing Excel files into R using readxl packageThe readxl package, developed by Hadley Wickham, can be used to easily import Excel files (xls|xlsx) into R without any external dependencies.

Can you use Xlsx in R?

xlsx() and read. xlsx2() can be used to read the contents of an Excel worksheet into an R data.


2 Answers

I stumbled on this question when someone sent me (yet another) Excel file to analyze. This one isn't even that big but for whatever reason I was running into a similar error:

java.lang.OutOfMemoryError: GC overhead limit exceeded 

Based on comment by @DirkEddelbuettel in a previous answer I installed the openxlsx package (http://cran.r-project.org/web/packages/openxlsx/). and then ran:

library("openxlsx") mydf <- read.xlsx("BigExcelFile.xlsx", sheet = 1, startRow = 2, colNames = TRUE) 

It was just what I was looking for. Easy to use and wicked fast. It's my new BFF. Thanks for the tip @DirkEddelbuettel!

BTW, I don't want to poach this answer from Dirk E, so if he posts an answer, please accept it rather than mine!

like image 178
orville jackson Avatar answered Oct 11 '22 03:10

orville jackson


options(java.parameters = "-Xmx2048m")  ## memory set to 2 GB library(XLConnect) 

allow for more memory using "options" before any java component is loaded. Then load XLConnect library (it uses java).

That's it. Start reading in data with readWorksheet .... and so on. :)

like image 33
viquanto Avatar answered Oct 11 '22 01:10

viquanto