Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Loading/Reading data in R taking up too much memory

Tags:

r

I am using R for some data analysis. System specs: i5 + 4GB RAM. For some reason, my R session is taking up a chunk of my RAM much much bigger than my data which leaves me with very little space for other operations.

I read a 550MB csv file, memory taken by R: 1.3 - 1.5GB I saved the csv as a .RData file. File size: 183MB. Loaded the file in R, memory taken by R: 780MB. Any idea why this could be happening and how to fix it?

Edits: The file has 123 columns and 1190387 rows. The variables are of type num and int.

like image 673
Macbook Avatar asked Jul 30 '12 04:07

Macbook


People also ask

How do I reduce memory usage in RStudio?

Go to Tools -> Memory and uncheck Show Current Memory Usage.

How do I limit memory usage in R?

Use memory. limit() . You can increase the default using this command, memory. limit(size=2500) , where the size is in MB.

How much data can you load into R?

R Objects live in memory entirely. Not possible to index objects with huge numbers of rows & columns even in 64 bit systems (2 Billion vector index limit) . Hits file size limit around 2-4 GB.

How many GB of data can R handle?

Today, R can address 8 TB of RAM if it runs on 64-bit machines. That is in many situations a sufficient improvement compared to about 2 GB addressable RAM on 32-bit machines. As an alternative, there are packages available that avoid storing data in memory.


1 Answers

A numeric value (double precision floating point) is stored in 8 bytes of ram.
An integer value (in this case) uses 4 bytes.
Your data has 1,190,387 * 123 = 146,417,601 values.
If all columns are numeric that makes 1,171,340,808 bytes of ram used (~1.09GB).
If all are integer then 585,670,404 bytes are needed (~558MB).

So it makes perfect sense that your data uses 780MB of ram.

Very General Advice:

  1. Convert your data.frame to a matrix. Matrix operations often have less overhead.
  2. Try R package bigmemory: http://cran.r-project.org/web/packages/bigmemory/index.html
  3. Buy more ram. Possibly your machine can support up to 16GB.
  4. Don't load all your data into ram at the same time. Load subsets of rows or columns, analyze, save results, repeat.
  5. Use a very small test dataset to design your analysis, then analyze the full dataset on another machine/server with more memory.
like image 137
bdemarest Avatar answered Nov 15 '22 09:11

bdemarest