Run a custom function on a data frame in R, by group

Tags:

Having some trouble getting a custom function to loop over a group in a data frame.

Here is some sample data:

set.seed(42)
tm <- as.numeric(c("1", "2", "3", "3", "2", "1", "2", "3", "1", "1"))
d <- as.numeric(sample(0:2, size = 10, replace = TRUE))
t <- as.numeric(sample(0:2, size = 10, replace = TRUE))
h <- as.numeric(sample(0:2, size = 10, replace = TRUE))

df <- as.data.frame(cbind(tm, d, t, h))
df$p <- rowSums(df[2:4])

I created a custom function to calculate the value w:

calc <- function(x) {
  data <- x
  w <- (1.27*sum(data$d) + 1.62*sum(data$t) + 2.10*sum(data$h)) / sum(data$p)
  w
  }

When I run the function on the entire data set, I get the following answer:

calc(df)
[1]1.664474

Ideally, I want to return results that are grouped by tm, e.g.:

tm     w
1    result of calc
2    result of calc
3    result of calc

So far I have tried using aggregate with my function, but I get the following error:

aggregate(df, by = list(tm), FUN = calc)
Error in data$d : $ operator is invalid for atomic vectors

I feel like I have stared at this too long and there is an obvious answer. Any advice would be appreciated.

774

asked Jul 15 '15 13:07

BillPetti

2 Answers

You can try split:

sapply(split(df, tm), calc)

#       1        2        3 
#1.665882 1.504545 1.838000

If you want a list lapply(split(df, tm), calc).

Or with data.table:

library(data.table)

setDT(df)[,calc(.SD),tm]
#   tm       V1
#1:  1 1.665882
#2:  2 1.504545
#3:  3 1.838000

answered Oct 14 '22 08:10

Colonel Beauvel

Using dplyr

library(dplyr)
df %>% 
   group_by(tm) %>%
   do(data.frame(val=calc(.)))
#  tm      val
#1  1 1.665882
#2  2 1.504545
#3  3 1.838000

If we change the function slightly to include multiple arguments, this could also work with summarise

 calc1 <- function(d1, t1, h1, p1){
      (1.27*sum(d1) + 1.62*sum(t1) + 2.10*sum(h1) )/sum(p1) }
 df %>%
     group_by(tm) %>% 
     summarise(val=calc1(d, t, h, p))
 #  tm      val
 #1  1 1.665882
 #2  2 1.504545
 #3  3 1.838000

answered Oct 14 '22 10:10

akrun

Related questions
                            
                                Suppressing some messages in R but leaving others?
                            
                                R code coverage for the testthat package
                            
                                Handling dates when we switch to daylight savings time and back
                            
                                extract RGB channels from a jpeg image in R
                            
                                Multiple time series in one plot
                            
                                neuralnet: overcoming the non convergence of algorithm
                            
                                Using expression(paste( to insert math notation into a legend
                            
                                Are dataframe[ ,-1] and dataframe[-1] the same?
                            
                                How to retrieve overall accuracy value from confusionMatrix in R?
                            
                                Protect/encrypt R package code for distribution [closed]
                            
                                R Shiny input slider range values
                            
                                min max scaling/normalization in r for train and test data
                            
                                Use column index instead of name in group_by
                            
                                How do I limit the range of the viridis colour scale?
                            
                                Save output between pipes in dplyr [duplicate]
                            
                                R plot with an x time axis: how to force the ticks labels to be the days?
                            
                                write a gzip file from data frame
                            
                                Calling Custom functions from Python using rpy2
                            
                                Difference between neighbouring elements of a vector
                            
                                What's the difference between as.integer() and +0L used on booleans?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Run a custom function on a data frame in R, by group

Tags:

function

r

aggregate

dplyr

BillPetti

People also ask

2 Answers

Colonel Beauvel

akrun

Recent Activity

Donate For Us