Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Obtaining year-on-year percentage change by group

I'm working with a data set corresponding to the extract:

set.seed(1)
df <- data.frame(indicator=runif(n = 100),cohort=letters[1:4],
                 year=rep(1976:2000, each=4))

I would like to generate a variable with percentage year-on-year change for each cohort represented in the data set. I have tried to use the code below (from this discussion):

df$ind_per_chng <- transform(new.col=c(NA,indicator[-1]/indicator[-nrow(df)]-1))

but I'm interested in making it work within each subgroup and generating only one extra column with percentage change instead of set of columns that are presently created:

> head(df)
  indicator cohort year ind_per_chng.indicator ind_per_chng.cohort ind_per_chng.year
1 0.2655087      a 1976              0.2655087                   a              1976
2 0.3721239      b 1976              0.3721239                   b              1976
3 0.5728534      c 1976              0.5728534                   c              1976
4 0.9082078      d 1976              0.9082078                   d              1976
5 0.2016819      a 1977              0.2016819                   a              1977
6 0.8983897      b 1977              0.8983897                   b              1977
  ind_per_chng.new.col
1                   NA
2            0.4015509
3            0.5394157
4            0.5854106
5           -0.7779342
6            3.4544877

Edit

To answer the useful comments, the format of the output should correspond to the table below:

desired format

There are no other changes to original data.frame with exception of the column that provides value for the percentage change for the selected variable for each cohort across years.

like image 539
Konrad Avatar asked Dec 14 '22 12:12

Konrad


1 Answers

I'm not sure I'm correctly understanding what you want the output to look like, but is that what you're after?

library(dplyr)
df2 <- df%>%
    group_by(cohort) %>%
    arrange(year) %>%
    mutate(pct.chg = (indicator - lag(indicator))/lag(indicator))

If you want your percentages on a 0-100 scale instead of 0-1, add 100 * () to that last line, so mutate(pct.chg = 100 * ((indicator - lag(indicator))/lag(indicator))). Here's what the result looks like:

  indicator cohort year    pct.chg
1 0.2655087      a 1976         NA
2 0.2016819      a 1977 -24.039416
3 0.6291140      a 1978 211.933767
4 0.6870228      a 1979   9.204818
5 0.7176185      a 1980   4.453369
6 0.9347052      a 1981  30.250993
like image 165
ulfelder Avatar answered Dec 28 '22 07:12

ulfelder