Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Practical limits of R data frame

I have been reading about how read.table is not efficient for large data files. Also how R is not suited for large data sets. So I was wondering where I can find what the practical limits are and any performance charts for (1) Reading in data of various sizes (2) working with data of varying sizes.

In effect, I want to know when the performance deteriorates and when I hit a road block. Also any comparison against C++/MATLAB or other languages would be really helpful. finally if there is any special performance comparison for Rcpp and RInside, that would be great!

like image 433
Egon Avatar asked Mar 08 '11 14:03

Egon


People also ask

Does R have a data limit?

Under most 64-bit versions of Windows the limit for a 32-bit build of R is 4Gb: for the oldest ones it is 2Gb. The limit for a 64-bit build of R (imposed by the OS) is 8Tb.

What is the data size limit of an R dataset?

The number is 2^31 - 1. This is the maximum number of rows for a data.

How do I restrict data frame in R?

To specify a logical expression for the rows parameter, use the standard R operators. If subsetting is done by only rows or only columns, then leave the other value blank. For example, to subset the d data frame only by rows, the general form reduces to d[rows,] . Similarly, to subset only by columns, d[,cols] .

Which R function can be used to make changes to a data frame?

transform() function in R Language is used to modify data. It converts the first argument to the data frame. This function is used to transform/modify the data frame in a quick and easy way.


1 Answers

R is suited for large data sets, but you may have to change your way of working somewhat from what the introductory textbooks teach you. I did a post on Big Data for R which crunches a 30 GB data set and which you may find useful for inspiration.

The usual sources for information to get started are High-Performance Computing Task View and the R-SIG HPC mailing list at R-SIG HPC.

The main limit you have to work around is a historic limit on the length of a vector to 2^31-1 elements which wouldn't be so bad if R did not store matrices as vectors. (The limit is for compatibility with some BLAS libraries.)

We regularly analyse telco call data records and marketing databases with multi-million customers using R, so would be happy to talk more if you are interested.

like image 81
Allan Engelhardt Avatar answered Sep 28 '22 12:09

Allan Engelhardt