Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split a data.frame by columns using a grouping variable

Tags:

split

dataframe

r

It's fairly easy to split a data.frame by rows depending on a grouping factor. But how do I split by columns and possibly apply a function?

my.df <- data.frame(a = runif(10),
        b = runif(10),
        c = runif(10),
        d = runif(10))
grp <- as.factor(c(1,1, 2,2))

What I would like to have is a mean of colums by groups.

What I have so far is a poor man's apply.

lapply(as.list(as.numeric(levels(grp))), FUN = function(x, cn, data) {
            rowMeans(data[grp %in% x])
        }, cn = grp, data = my.df)

EDIT Thank you all for participating. I ran 10 replicates* and my working data.frame has roughly 22000 rows. These are the results in seconds.

Roman: 2.19
Joris: 4.60
Joris #2: 3.79 #changed sapply to lapply as suggested by Joris in the [R chatroom][1].
Gavin: 4.70
James & EDi: > 200 # * ran only one replicate due to the large order of magnitude difference

It struck me as odd that there is no wrapper function for the task at hand. Maybe someday we'll be able to do

apply(X = my.df, MARGIN = 3, INDEX = my.groups, FUN = mean) # :)
like image 424
Roman Luštrik Avatar asked Apr 26 '11 11:04

Roman Luštrik


People also ask

How do I split a DataFrame into a group?

Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.

How do I split a column into groups in R?

Split() is a built-in R function that divides a vector or data frame into groups according to the function's parameters. It takes a vector or data frame as an argument and divides the information into groups. The syntax for this function is as follows: split(x, f, drop = FALSE, ...)


1 Answers

You can use the same logic, but in a more convenient form :

sapply(levels(grp),function(x)rowMeans(my.df[which(grp==x)]))
like image 136
Joris Meys Avatar answered Sep 27 '22 19:09

Joris Meys