Having some trouble getting a custom function to loop over a group in a data frame.
Here is some sample data:
set.seed(42)
tm <- as.numeric(c("1", "2", "3", "3", "2", "1", "2", "3", "1", "1"))
d <- as.numeric(sample(0:2, size = 10, replace = TRUE))
t <- as.numeric(sample(0:2, size = 10, replace = TRUE))
h <- as.numeric(sample(0:2, size = 10, replace = TRUE))
df <- as.data.frame(cbind(tm, d, t, h))
df$p <- rowSums(df[2:4])
I created a custom function to calculate the value w:
calc <- function(x) {
  data <- x
  w <- (1.27*sum(data$d) + 1.62*sum(data$t) + 2.10*sum(data$h)) / sum(data$p)
  w
  }
When I run the function on the entire data set, I get the following answer:
calc(df)
[1]1.664474
Ideally, I want to return results that are grouped by tm, e.g.:
tm     w
1    result of calc
2    result of calc
3    result of calc
So far I have tried using aggregate with my function, but I get the following error:
aggregate(df, by = list(tm), FUN = calc)
Error in data$d : $ operator is invalid for atomic vectors
I feel like I have stared at this too long and there is an obvious answer. Any advice would be appreciated.
Group_by() function belongs to the dplyr package in the R programming language, which groups the data frames. Group_by() function alone will not give any output. It should be followed by summarise() function with an appropriate action to perform. It works similar to GROUP BY in SQL and pivot table in excel.
Split() is a built-in R function that divides a vector or data frame into groups according to the function's parameters. It takes a vector or data frame as an argument and divides the information into groups. The syntax for this function is as follows: split(x, f, drop = FALSE, ...)
A grouped data object is a special form of data frame consisting of one column of contiguous group boundaries and one or more columns of frequencies within each group. The function can create a grouped data object from two types of arguments.
All of the dplyr functions take a data frame (or tibble) as the first argument. Rather than forcing the user to either save intermediate objects or nest functions, dplyr provides the %>% operator from magrittr.
You can try split:
sapply(split(df, tm), calc)
#       1        2        3 
#1.665882 1.504545 1.838000 
If you want a list lapply(split(df, tm), calc).
Or with data.table:
library(data.table)
setDT(df)[,calc(.SD),tm]
#   tm       V1
#1:  1 1.665882
#2:  2 1.504545
#3:  3 1.838000
Using dplyr
library(dplyr)
df %>% 
   group_by(tm) %>%
   do(data.frame(val=calc(.)))
#  tm      val
#1  1 1.665882
#2  2 1.504545
#3  3 1.838000
If we change the function slightly to include multiple arguments, this could also work with summarise
 calc1 <- function(d1, t1, h1, p1){
      (1.27*sum(d1) + 1.62*sum(t1) + 2.10*sum(h1) )/sum(p1) }
 df %>%
     group_by(tm) %>% 
     summarise(val=calc1(d, t, h, p))
 #  tm      val
 #1  1 1.665882
 #2  2 1.504545
 #3  3 1.838000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With