I want to find the difference between the cases that were observed and those that were not by type of case:
set.seed(42)
df <- data.frame(type = factor(rep(c("A", "B", "C"), 2)), observed = rep(c(T,F), 3),
val1 = sample(5:1, 6, replace = T), val2 = sample(1:5, 6, replace = T),
val3 = sample(letters[1:5], 6, replace = T))
# type observed val1 val2 val3
# 1 A TRUE 1 4 e
# 2 B FALSE 1 1 b
# 3 C TRUE 4 4 c
# 4 A FALSE 1 4 e
# 5 B TRUE 2 3 e
# 6 C FALSE 3 4 a
The following code works when there are only two different types of cases (e.g. levels(df$type) == c("A", "B")
, but it does not for the df
provided above:
df %>%
group_by(type, observed) %>%
summarise_if(is.numeric, funs(diff(., 1)))
The desired output is:
# type val1 val2
# A 0 0
# B -1 -2
# C -1 0
This'll do it:
df %>%
group_by(type) %>%
arrange(type, desc(observed)) %>%
mutate_if(is.numeric,funs(. - lag(., default=0))) %>%
summarise_if(is.numeric, tail, 1)
# # A tibble: 3 x 3
# type val1 val2
# <fctr> <dbl> <dbl>
# 1 A -1 0
# 2 B -2 0
# 3 C 3 1
One of the dplyr
wizards can probably come up with a more elegant approach, though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With