I'm working with a data set corresponding to the extract:
set.seed(1)
df <- data.frame(indicator=runif(n = 100),cohort=letters[1:4],
year=rep(1976:2000, each=4))
I would like to generate a variable with percentage year-on-year change for each cohort
represented in the data set. I have tried to use the code below (from this discussion):
df$ind_per_chng <- transform(new.col=c(NA,indicator[-1]/indicator[-nrow(df)]-1))
but I'm interested in making it work within each subgroup and generating only one extra column with percentage change instead of set of columns that are presently created:
> head(df)
indicator cohort year ind_per_chng.indicator ind_per_chng.cohort ind_per_chng.year
1 0.2655087 a 1976 0.2655087 a 1976
2 0.3721239 b 1976 0.3721239 b 1976
3 0.5728534 c 1976 0.5728534 c 1976
4 0.9082078 d 1976 0.9082078 d 1976
5 0.2016819 a 1977 0.2016819 a 1977
6 0.8983897 b 1977 0.8983897 b 1977
ind_per_chng.new.col
1 NA
2 0.4015509
3 0.5394157
4 0.5854106
5 -0.7779342
6 3.4544877
To answer the useful comments, the format of the output should correspond to the table below:
There are no other changes to original data.frame
with exception of the column that provides value for the percentage change for the selected variable for each cohort across years.
I'm not sure I'm correctly understanding what you want the output to look like, but is that what you're after?
library(dplyr)
df2 <- df%>%
group_by(cohort) %>%
arrange(year) %>%
mutate(pct.chg = (indicator - lag(indicator))/lag(indicator))
If you want your percentages on a 0-100 scale instead of 0-1, add 100 * ()
to that last line, so mutate(pct.chg = 100 * ((indicator - lag(indicator))/lag(indicator)))
. Here's what the result looks like:
indicator cohort year pct.chg
1 0.2655087 a 1976 NA
2 0.2016819 a 1977 -24.039416
3 0.6291140 a 1978 211.933767
4 0.6870228 a 1979 9.204818
5 0.7176185 a 1980 4.453369
6 0.9347052 a 1981 30.250993
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With