split a data.frame by columns using a grouping variable

Tags:

It's fairly easy to split a data.frame by rows depending on a grouping factor. But how do I split by columns and possibly apply a function?

my.df <- data.frame(a = runif(10),
        b = runif(10),
        c = runif(10),
        d = runif(10))
grp <- as.factor(c(1,1, 2,2))

What I would like to have is a mean of colums by groups.

What I have so far is a poor man's apply.

lapply(as.list(as.numeric(levels(grp))), FUN = function(x, cn, data) {
            rowMeans(data[grp %in% x])
        }, cn = grp, data = my.df)

EDIT Thank you all for participating. I ran 10 replicates* and my working data.frame has roughly 22000 rows. These are the results in seconds.

Roman: 2.19
Joris: 4.60
Joris #2: 3.79 #changed sapply to lapply as suggested by Joris in the [R chatroom][1].
Gavin: 4.70
James & EDi: > 200 # * ran only one replicate due to the large order of magnitude difference

It struck me as odd that there is no wrapper function for the task at hand. Maybe someday we'll be able to do

apply(X = my.df, MARGIN = 3, INDEX = my.groups, FUN = mean) # :)

424

asked Apr 26 '11 11:04

Roman Luštrik

1 Answers

You can use the same logic, but in a more convenient form :

sapply(levels(grp),function(x)rowMeans(my.df[which(grp==x)]))

136

answered Sep 27 '22 19:09

Joris Meys

Related questions
                            
                                Return indices of rows whose elements (columns) all match a reference vector
                            
                                How can I vary opacity in a plotly R chart
                            
                                R corrplot change data labels
                            
                                R: programmatically create a function call
                            
                                Removing elements in a nested R list by name
                            
                                Fast calculation of CDF / rolling join on multiple columns
                            
                                How to shut down an R parallel cluster without the cluster variable?
                            
                                Where to create package environment variables?
                            
                                Setting x-axis limits for datetime in ggplot
                            
                                Use R and Openxlsx to output a list of dataframes as worksheets in a single Excel file
                            
                                Error: invalid subscript type 'list' in R
                            
                                Operations on multiple tables / datasets with Edit Queries and R in Power BI
                            
                                How to show every second R ggplot2 x-axis label value?
                            
                                In R Markdown, create header/footer on every page regardless of output type (pdf, html, docx)
                            
                                Replace one element in vector with multiple elements
                            
                                Is `{` a class?
                            
                                Best practice for updating Hugo academic theme installed with install_hugo()
                            
                                Efficiently fill NAs by group
                            
                                R bootstrap regression with facet_wrap
                            
                                different behavior when using different number of multicoring workers

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

split a data.frame by columns using a grouping variable

Tags:

split

dataframe

r

Roman Luštrik

People also ask

1 Answers

Joris Meys

Recent Activity

Donate For Us