Possible Duplicate:
Quickly reading very large tables as dataframes in R
Hi,
trying to read a large dataset in R the console displayed the follwing errors:
data<-read.csv("UserDailyStats.csv", sep=",", header=T, na.strings="-", stringsAsFactors=FALSE)
> data = data[complete.cases(data),]
> dataset<-data.frame(user_id=as.character(data[,1]),event_date= as.character(data[,2]),day_of_week=as.factor(data[,3]),distinct_events_a_count=as.numeric(as.character(data[,4])),total_events_a_count=as.numeric(as.character(data[,5])),events_a_duration=as.numeric(as.character(data[,6])),distinct_events_b_count=as.numeric(as.character(data[,7])),total_events_b=as.numeric(as.character(data[,8])),events_b_duration= as.numeric(as.character(data[,9])))
Error: cannot allocate vector of size 94.3 Mb
In addition: Warning messages:
1: In data.frame(user_msisdn = as.character(data[, 1]), calls_date = as.character(data[, :
NAs introduced by coercion
2: In data.frame(user_msisdn = as.character(data[, 1]), calls_date = as.character(data[, :
NAs introduced by coercion
3: In class(value) <- "data.frame" :
Reached total allocation of 3583Mb: see help(memory.size)
4: In class(value) <- "data.frame" :
Reached total allocation of 3583Mb: see help(memory.size)
Does anyone know how to read large datasets? The size of UserDailyStats.csv is approximately 2GB.
Sure:
There is also a manual for this at the R site.
You could try specifying the data type in the read.csv
call using colClasses
.
data<-read.csv("UserDailyStats.csv", sep=",", header=T, na.strings="-", stringsAsFactors=FALSE, colClasses=c("character","character","factor",rep("numeric",6)))
Though with a dataset of this size it may still be problematic and there isn't a great deal of memory left for any analysis you may want to do. Adding RAM & using 64-bit computing would provide more flexibility.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With