Lets say I have the following (dummy) df:
df <- data.frame(conference = c('East', 'East', 'East', 'West', 'West', 'East'),
team = c('A', 'A', 'A', 'B', 'B', 'C'),
points = c(11, 8, 10, 6, 6, 5),
rebounds = c(7, 7, 6, 9, 12, 8))
and I want to do some math to the points and rebounds column. In base R, I could do stuff like
a_val <- sum(df$points[which(df$team == "A")]) /
sum(df$rebounds[which(df$team == "A")])
b_val <- sum(df$points[which(df$team == "B" & df$rebounds >= 7)]) /
sum(df$rebounds[which(df$team == "B" & df$rebounds >= 7)])
What is the equivalent of which() in tidy-verse to make these kinds of operations more efficient?
Easiest route is grouping the dataframe before doing the calculations with summarise:
library(tidyverse)
df <- data.frame(conference = c('East', 'East', 'East', 'West', 'West', 'East'),
team = c('A', 'A', 'A', 'B', 'B', 'C'),
points = c(11, 8, 10, 6, 6, 5),
rebounds = c(7, 7, 6, 9, 12, 8))
df |>
group_by(team) |>
summarise(a_val = sum(points)/sum(rebounds),
b_val = sum(points[rebounds>=7])/sum(rebounds>=7))
#> # A tibble: 3 × 3
#> team a_val b_val
#> <chr> <dbl> <dbl>
#> 1 A 1.45 9.5
#> 2 B 0.571 6
#> 3 C 0.625 5
We don't actually need which in the question's example. Without it we are left with logical indexing which works on both data frames and tibbles and gives the same answer. e.g.
library(tibble)
tib <- as_tibble(df)
a_val <- sum(tib$points[tib$team == "A"]) / sum(tib$rebounds[tib$team == "A"])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With