It's fairly easy to split a data.frame
by rows depending on a grouping factor. But how do I split by columns and possibly apply a function?
my.df <- data.frame(a = runif(10),
b = runif(10),
c = runif(10),
d = runif(10))
grp <- as.factor(c(1,1, 2,2))
What I would like to have is a mean of colums by groups.
What I have so far is a poor man's apply.
lapply(as.list(as.numeric(levels(grp))), FUN = function(x, cn, data) {
rowMeans(data[grp %in% x])
}, cn = grp, data = my.df)
EDIT Thank you all for participating. I ran 10 replicates* and my working data.frame has roughly 22000 rows. These are the results in seconds.
Roman: 2.19
Joris: 4.60
Joris #2: 3.79 #changed sapply to lapply as suggested by Joris in the [R chatroom][1].
Gavin: 4.70
James & EDi: > 200 # * ran only one replicate due to the large order of magnitude difference
It struck me as odd that there is no wrapper function for the task at hand. Maybe someday we'll be able to do
apply(X = my.df, MARGIN = 3, INDEX = my.groups, FUN = mean) # :)
Step 1: split the data into groups by creating a groupby object from the original DataFrame; Step 2: apply a function, in this case, an aggregation function that computes a summary statistic (you can also transform or filter your data in this step); Step 3: combine the results into a new DataFrame.
Split() is a built-in R function that divides a vector or data frame into groups according to the function's parameters. It takes a vector or data frame as an argument and divides the information into groups. The syntax for this function is as follows: split(x, f, drop = FALSE, ...)
You can use the same logic, but in a more convenient form :
sapply(levels(grp),function(x)rowMeans(my.df[which(grp==x)]))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With