I've a CSV file which has 600,000 rows and 1339 columns making 1.6 GB. 1337 columns are binaries taking either 1 or 0 values and other 2 columns are numeric and character variables.
I pulled the data use the package readr with following code
VLU_All_Before_Wide <- read_csv("C:/Users/petas/Desktop/VLU_All_Before_Wide_Sample.csv")
When I checked the object size using following code, it's about 3 gb.
> print(object.size(VLU_All_Before_Wide),units="Gb")
3.2 Gb
In the next step, using the below code, I want to create training and test set for LASSO regression.
set.seed(1234)
train_rows <- sample(1:nrow(VLU_All_Before_Wide), .7*nrow(VLU_All_Before_Wide))
train_set <- VLU_All_Before_Wide[train_rows,]
test_set <- VLU_All_Before_Wide[-train_rows,]
yall_tra <- data.matrix(subset(train_set, select=VLU_Incidence))
xall_tra <- data.matrix(subset(train_set, select=-c(VLU_Incidence,Replicate)))
yall_tes <- data.matrix(subset(test_set, select=VLU_Incidence))
xall_tes <- data.matrix(subset(test_set, select=-c(VLU_Incidence,Replicate)))
When I started my R session the RAM was at ~3 gb and by the time I exicuted all the above code it's now at 14 gb, leaving me an error saying can't allocate vector of size 4 gb. There was no other application running other than 3 chrome windows. I removed the original dataset, training and test dataset but it only reduced .7 to 1 gb RAM.
rm(VLU_All_Before_Wide)
rm(test_set)
rm(train_set)
Appreciate if someone can guide me a way to reduce the size of the data.
Thanks
R struggles when it comes to huge datasets because it tries to load and keep all the data into the RAM. You can use other packages available in R which are made to handle big datasets, like 'bigmemory
and ff
. Check my answer here which addresses a similar issue.
You can also choose to do some data processing & manipulation outside R and remove unnecessary columns and rows. But still, to handle big datasets, it's better to use the capable packages.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With