I have a dataframe of names and years, with a dummy variable for whether the name occurred in a year or not.
I'm trying to create a dataframe which tell me
In the below example, in 2017 there is only one person occurring (Terry) and nothing for the previous year, so both total and new would be 1. In 2018 three people occur but only two are new as Terry occurred in the previous year. If somebody appeared in 2017 and 2019 but not in 2018, they should be classed as new in 2019.
EXAMPLE
Name x2017 x2018 x2019
1 Terry 1 1 0
2 Sam 0 0 1
3 Nic 0 1 1
4 Sarah 0 1 1
CODE
data.frame(
Name = c("Terry", "Sam", "Nic", "Sarah"),
x2017 = c(1, 0, 0, 0),
x2018 = c(1, 0, 1, 1),
x2019 = c(0, 1, 1, 1)
)
OUTPUT I'M TRYING TO CREATE
Year Total New
1 2017 1 1
2 2018 3 2
3 2019 3 1
I've tried filtering and using row sums, but I feel like there's a function which I don't know of that can do this.
Thanks!
The mutate(new = as.numeric(values == 1 & lag(values) == 0), new = ifelse(is.na(new), values, new)) %>%part is from stefan
(credits to him, thank you stefan).
The difference is parse_number
library(tidyverse)
df %>%
pivot_longer(
cols = -Name,
names_to = "Year",
values_to = "values"
) %>%
mutate(Year = parse_number(Year)) %>%
mutate(new = as.numeric(values == 1 & lag(values) == 0),
new = ifelse(is.na(new), values, new)) %>%
group_by(Year) %>%
summarise(Total = sum(values), New = sum(new))
output:
Year Total New
* <dbl> <dbl> <dbl>
1 2017 1 1
2 2018 3 2
3 2019 3 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With