Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How much data can R handle? [closed]

Tags:

r

large-data

By "handle" I mean manipulate multi-columnar rows of data. How does R stack up against tools like Excel, SPSS, SAS, and others? Is R a viable tool for looking at "BIG DATA" (hundreds of millions to billions of rows)? If not, which statistical programming tools are best suited for analysis large data sets?

like image 796
AME Avatar asked Apr 03 '11 05:04

AME


People also ask

Can R handle 2 million rows?

As a rule of thumb: Data sets that contain up to one million records can easily processed with standard R. Data sets with about one million to one billion records can also be processed in R, but need some additional effort.

Can R handle 1 billion rows?

Even after filtering out missing and out-of-bounds data points, there are still 1.2 billion rows, and R can't do that without assistance. Rows: 4,677,864 Columns: 3 $ x <int> 1058, 1024, 1162, 3525, 865, 794, 856, 705, 647, 762, 802, 1207… $ y <int> 2189, 2040, 2265, 552, 1983, 1646, 2018, 1590, 1723, 2010, 1645…

How many GB of data can R handle?

The Problem with large data sets in R: R Objects live in memory entirely. Not possible to index objects with huge numbers of rows & columns even in 64 bit systems (2 Billion vector index limit) . Hits file size limit around 2-4 GB.

Does R have a row limit?

Similar to Excel, with Mac Numbers you'll see warning if you'r file exceeds 1,000,000 rows. This one can be misleading and catch you off-guard if you're dealing with large files. While they do not have a specific row limit, they do enforce a cell limit of 5 million cells.


2 Answers

If you look at the High-Performance Computing Task View on CRAN, you will get a good idea of what R can do in a sense of high performance.

like image 186
Roman Luštrik Avatar answered Oct 20 '22 04:10

Roman Luštrik


You can in principal store as much data as you have RAM with the exception that, currently, vectors and matrices are restricted to 2^31 - 1 elements because R uses 32-bit indexes on vectors. General vectors (lists, and their derivative data frames) are restricted to 2^31 - 1 components, and each of those components has the same restrictions as vectors/matrices/lists/data.frames etc.

Of course these are theoretical limits, if you want to do anything with data in R it will inevitably require space to hold a couple of copies at least, as R will usually copy data passed in to functions etc.

There are efforts to allow on disk storage (rather than in RAM); but even those will be restricted to the 2^31-1 restrictions mentioned above in use in R at any one time. See the Large memory and out-of-memory data section of the High Performance Computing Task View linked to in @Roman's post.

like image 44
Gavin Simpson Avatar answered Oct 20 '22 04:10

Gavin Simpson