The title pretty much states it. I have a data frame that has 7+million rows, far too big for me to analyze without my machine crashing. I want to split it into 100 smaller dataframes of with 70,000 rows, and have the 101th dataframe have the remaining rows (< 70,000). It seems that this is non-trivial.
I know I could manually calculating the size of the n+1
dataframe, removing it, and then using the split
function in the following way:
d <- split(my_data_frame,rep(1:100,each=70,000))
But I have multiple large dataframes and doing all these calculations is tedious. Is there an alternative solution?
How about something like this:
df <- data.frame(x = 1:723500, y = runif(7235000))
split(df, rep(1:100, each = round(NROW(df) / 100, -4)))
Or abstracting some more:
num_dfs <- 100
split(df, rep(1:num_dfs, each = round(NROW(df) / num_dfs, -4)))
You may want to consider something from the caret
package such as: caret::createFolds(df$x)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With