Split up a dataframe by number of rows

Tags:

I have a dataframe made up of 400'000 rows and about 50 columns. As this dataframe is so large, it is too computationally taxing to work with. I would like to split this dataframe up into smaller ones, after which I will run the functions I would like to run, and then reassemble the dataframe at the end.

There is no grouping variable that I would like to use to split up this dataframe. I would just like to split it up by number of rows. For example, I would like to split this 400'000-row table into 400 1'000-row dataframes. How might I do this?

948

asked Aug 14 '11 22:08

Pascal

2 Answers

Make your own grouping variable.

d <- split(my_data_frame,rep(1:400,each=1000))

You should also consider the ddply function from the plyr package, or the group_by() function from dplyr.

edited for brevity, after Hadley's comments.

If you don't know how many rows are in the data frame, or if the data frame might be an unequal length of your desired chunk size, you can do

chunk <- 1000 n <- nrow(my_data_frame) r  <- rep(1:ceiling(n/chunk),each=chunk)[1:n] d <- split(my_data_frame,r)

You could also use

r <- ggplot2::cut_width(1:n,chunk,boundary=0)

For future readers, methods based on the dplyr and data.table packages will probably be (much) faster for doing group-wise operations on data frames, e.g. something like

(my_data_frame     %>% mutate(index=rep(1:ngrps,each=full_number)[seq(.data)])    %>% group_by(index)    %>% [mutate, summarise, do()] ... )

There are also many answers here

136

answered Sep 28 '22 00:09

Ben Bolker

I had a similar question and used this:

library(tidyverse) n = 100 #number of groups split <- df %>% group_by(row_number() %/% n) %>% group_map(~ .x)

from left to right:

you assign your result to split
you start with df as your input dataframe
then you group your data by dividing the row_number by n (number of groups) using modular division.
then you just pass that group through the group_map function which returns a list.

So in the end your split is a list with in each element a group of your dataset. On the other hand, you could also immediately write your data by replacing the group_map call by e.g. group_walk(~ write_csv(.x, paste0("file_", .y, ".csv"))).

You can find more info on these powerful tools on: Cheat sheet of dplyr explaining group_by and also below for: group_map, group_walk follow up functions

answered Sep 28 '22 01:09

Jurgen Van Impe

Related questions
                            
                                cbind warnings : row names were found from a short variable and have been discarded
                            
                                Meaning of band width in ggplot geom_smooth lm
                            
                                R: lookaround within lookaround
                            
                                point size in ggplot 2.0.0
                            
                                Execute two commands sequentially on one line in R?
                            
                                Matrix power in R
                            
                                change the default colour palette in ggplot
                            
                                How to label a barplot bar with positive and negative bars with ggplot2
                            
                                How to append a plot to an existing pdf file
                            
                                roxygen2 manually insert line breaks
                            
                                Is there a command similar to Matlab's "close all" in R?
                            
                                Reading a pickle file (PANDAS Python Data Frame) in R
                            
                                Building RESTful API using R [closed]
                            
                                Best way to allocate matrix in R, NULL vs NA?
                            
                                How to get coefficients and their confidence intervals in mixed effects models?
                            
                                Circular Heatmap that looks like a donut
                            
                                How to add documentation to a data.frame in R?
                            
                                List of Defined Variables in R
                            
                                Hyperlinking text in a ggplot2 visualization
                            
                                Create a PDF table

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Split up a dataframe by number of rows

Tags:

split

dataframe

r

Pascal

People also ask

2 Answers

Ben Bolker

Jurgen Van Impe

Recent Activity

Donate For Us