Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a function to split a large dataframe into n smaller dataframes of equal size (by row) and have an n+1 dataframe of smaller size?

Tags:

split

r

The title pretty much states it. I have a data frame that has 7+million rows, far too big for me to analyze without my machine crashing. I want to split it into 100 smaller dataframes of with 70,000 rows, and have the 101th dataframe have the remaining rows (< 70,000). It seems that this is non-trivial.

I know I could manually calculating the size of the n+1 dataframe, removing it, and then using the split function in the following way:

d <- split(my_data_frame,rep(1:100,each=70,000))

But I have multiple large dataframes and doing all these calculations is tedious. Is there an alternative solution?

like image 692
oort Avatar asked Jun 14 '15 18:06

oort


1 Answers

How about something like this:

df <- data.frame(x = 1:723500, y = runif(7235000))
split(df, rep(1:100, each = round(NROW(df) / 100, -4)))

Or abstracting some more:

num_dfs <- 100
split(df, rep(1:num_dfs, each = round(NROW(df) / num_dfs, -4)))

You may want to consider something from the caret package such as: caret::createFolds(df$x)

like image 156
JasonAizkalns Avatar answered Sep 22 '22 10:09

JasonAizkalns