Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split dataframe using two columns of data and apply common transformation on list of resulting dataframes

Tags:

split

dataframe

r

I want to split a large dataframe into a list of dataframes according to the values in two columns. I then want to apply a common data transformation on all dataframes (lag transformation) in the resulting list. I'm aware of the split command but can only get it to work on one column of data at a time.

like image 549
user1160760 Avatar asked Jan 20 '12 14:01

user1160760


People also ask

How do you split a Dataframe based on column values in Python?

In the above example, the data frame 'df' is split into 2 parts 'df1' and 'df2' on the basis of values of column 'Weight'. Method 2: Using Dataframe. groupby(). This method is used to split the data into groups based on some criteria.

How do you split a column into two in a data frame?

Use underscore as delimiter to split the column into two columns. # Adding two new columns to the existing dataframe.

How do I split a column into multiple rows in Python?

To split multiple array column data into rows pyspark provides a function called explode(). Using explode, we will get a new row for each element in the array.


1 Answers

You need to put all the factors you want to split by in a list, eg:

split(mtcars,list(mtcars$cyl,mtcars$gear)) 

Then you can use lapply on this to do what else you want to do.

If you want to avoid having zero row dataframes in the results, there is a drop parameter whose default is the opposite of the drop parameter in the "[" function.

split(mtcars,list(mtcars$cyl,mtcars$gear), drop=TRUE) 
like image 117
James Avatar answered Sep 23 '22 00:09

James