Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to split a data frame?

I want to split a data frame into several smaller ones. This looks like a very trivial question, however I cannot find a solution from web search.

like image 917
Leo5188 Avatar asked Jul 21 '10 18:07

Leo5188


People also ask

How do you split a data frame in half?

In the above example, the data frame 'df' is split into 2 parts 'df1' and 'df2' on the basis of values of column 'Weight'. Method 2: Using Dataframe. groupby(). This method is used to split the data into groups based on some criteria.

How do you split a DataFrame in R?

Use the split() function in R to split a vector or data frame. Use the unsplit() method to retrieve the split vector or data frame.

How do you split a DataFrame in Python?

Using the iloc() function to split DataFrame in Python We can use the iloc() function to slice DataFrames into smaller DataFrames. The iloc() function allows us to access elements based on the index of rows and columns. Using this function, we can split a DataFrame based on rows or columns.


2 Answers

You may also want to cut the data frame into an arbitrary number of smaller dataframes. Here, we cut into two dataframes.

x = data.frame(num = 1:26, let = letters, LET = LETTERS) set.seed(10) split(x, sample(rep(1:2, 13))) 

gives

$`1`    num let LET 3    3   c   C 6    6   f   F 10  10   j   J 12  12   l   L 14  14   n   N 15  15   o   O 17  17   q   Q 18  18   r   R 20  20   t   T 21  21   u   U 22  22   v   V 23  23   w   W 26  26   z   Z  $`2`    num let LET 1    1   a   A 2    2   b   B 4    4   d   D 5    5   e   E 7    7   g   G 8    8   h   H 9    9   i   I 11  11   k   K 13  13   m   M 16  16   p   P 19  19   s   S 24  24   x   X 25  25   y   Y 

You can also split a data frame based upon an existing column. For example, to create three data frames based on the cyl column in mtcars:

split(mtcars,mtcars$cyl) 
like image 94
Greg Avatar answered Sep 19 '22 20:09

Greg


If you want to split a dataframe according to values of some variable, I'd suggest using daply() from the plyr package.

library(plyr) x <- daply(df, .(splitting_variable), function(x)return(x)) 

Now, x is an array of dataframes. To access one of the dataframes, you can index it with the name of the level of the splitting variable.

x$Level1 #or x[["Level1"]] 

I'd be sure that there aren't other more clever ways to deal with your data before splitting it up into many dataframes though.

like image 20
JoFrhwld Avatar answered Sep 19 '22 20:09

JoFrhwld