I have a yearly time series dataframe with few grouping variables and I need to add an index column that is based on a particular year.
df <- data.frame(YEAR = c(2000,2001,2002,2000,2001,2002),
GRP = c("A","A","A","B","B","B"),
VAL = sample(6))
I want to make a simple index of variable VAL that is the value divided with the value of the base year, say 2000:
df$VAL.IND <- df$VAL/df$VAL[df$YEAR == 2000]
This is not right as it does not respect the grouping variable GRP. I tried with plyr but I could not make it work.
In my actual problem I have several grouping variables with varying time series and thus I'm looking for a quite general solution.
We can create the 'VAL.IND' after doing the calculation within the grouping variable ('GRP'). This can be done in many ways.
One option is data.table
where we create 'data.table' from 'data.frame' (setDT(df)
), Grouped by 'GRP', we divide the 'VAL' by the 'VAL' that corresponds to 'YEAR' value of 2000.
library(data.table)
setDT(df)[, VAL.IND := VAL/VAL[YEAR==2000], by = GRP]
NOTE: The base
YEAR is a bit confusing wrt to the result. In the example, both the 'A' and 'B' GRP have 'YEAR' 2000. Suppose, if the OP meant to use the minimum YEAR value (considering that it is numeric column), VAL/VAL[YEAR==2000]
in the above code can be replaced with VAL/VAL[which.min(YEAR)]
.
Or you can use a similar code with dplyr
. We group by 'GRP' and use mutate
to create the 'VAL.IND'
library(dplyr)
df %>%
group_by(GRP) %>%
mutate(VAL.IND = VAL/VAL[YEAR==2000])
Here also, if we needed replace VAL/VAL[YEAR==2000]
with VAL/VAL[which.min(YEAR)]
A base R
option with split/unsplit
. We split
the dataset by the 'GRP' column to convert the data.frame
to a list
of dataframes, loop through the list
output with lapply
, create a new column using transform
(or within
) and convert the list
with the added column back to a single data.frame
by unsplit
.
unsplit(lapply(split(df, df$GRP), function(x)
transform(x, VAL.IND= VAL/VAL[YEAR==2000])), df$GRP)
Note that we can also use do.call(rbind
instead of unsplit
. But, I prefer unsplit
to get the same row order as the original dataset.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With