Create new dataframe based on sequential row values

Question

I have a dataframe of names and years, with a dummy variable for whether the name occurred in a year or not.

I'm trying to create a dataframe which tell me

1. the total number names which appeared in that year, and
1. the number of those which appeared in that year but not in the year before.

In the below example, in 2017 there is only one person occurring (Terry) and nothing for the previous year, so both total and new would be 1. In 2018 three people occur but only two are new as Terry occurred in the previous year. If somebody appeared in 2017 and 2019 but not in 2018, they should be classed as new in 2019.

EXAMPLE

   Name x2017 x2018 x2019
1 Terry     1     1     0
2   Sam     0     0     1
3   Nic     0     1     1
4 Sarah     0     1     1

CODE

data.frame(
  Name = c("Terry", "Sam", "Nic", "Sarah"), 
  x2017 = c(1, 0, 0, 0), 
  x2018 = c(1, 0, 1, 1), 
  x2019 = c(0, 1, 1, 1)
  )

OUTPUT I'M TRYING TO CREATE

  Year Total New
1 2017     1   1
2 2018     3   2
3 2019     3   1

I've tried filtering and using row sums, but I feel like there's a function which I don't know of that can do this.

Thanks!

TarJae · Accepted Answer

The mutate(new = as.numeric(values == 1 & lag(values) == 0), new = ifelse(is.na(new), values, new)) %>%part is from stefan (credits to him, thank you stefan). The difference is parse_number

library(tidyverse)
df %>% 
  pivot_longer(
    cols = -Name,
    names_to = "Year", 
    values_to = "values"
  ) %>% 
  mutate(Year = parse_number(Year)) %>% 
  mutate(new = as.numeric(values == 1 & lag(values) == 0),
         new = ifelse(is.na(new), values, new)) %>% 
  group_by(Year) %>% 
  summarise(Total = sum(values), New = sum(new))

output:

   Year Total   New
* <dbl> <dbl> <dbl>
1  2017     1     1
2  2018     3     2
3  2019     3     1

Create new dataframe based on sequential row values

Tags:

r

dplyr

rolling-computation

accumulate

tidyverse

Sean

1 Answers

TarJae

Recent Activity

Donate For Us

Create new dataframe based on sequential row values

Tags:

r

dplyr

rolling-computation

accumulate

tidyverse

Sean

1 Answers

TarJae

Related questions

Recent Activity

Donate For Us