Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding a base year index to R dataframe with multiple groups

Tags:

dataframe

r

I have a yearly time series dataframe with few grouping variables and I need to add an index column that is based on a particular year.

df <- data.frame(YEAR = c(2000,2001,2002,2000,2001,2002), 
                 GRP = c("A","A","A","B","B","B"),
                 VAL = sample(6))

I want to make a simple index of variable VAL that is the value divided with the value of the base year, say 2000:

df$VAL.IND <- df$VAL/df$VAL[df$YEAR == 2000]

This is not right as it does not respect the grouping variable GRP. I tried with plyr but I could not make it work.

In my actual problem I have several grouping variables with varying time series and thus I'm looking for a quite general solution.

like image 669
Antti Avatar asked Jan 08 '23 21:01

Antti


1 Answers

We can create the 'VAL.IND' after doing the calculation within the grouping variable ('GRP'). This can be done in many ways.

One option is data.table where we create 'data.table' from 'data.frame' (setDT(df)), Grouped by 'GRP', we divide the 'VAL' by the 'VAL' that corresponds to 'YEAR' value of 2000.

 library(data.table)
 setDT(df)[, VAL.IND := VAL/VAL[YEAR==2000], by = GRP]

NOTE: The base YEAR is a bit confusing wrt to the result. In the example, both the 'A' and 'B' GRP have 'YEAR' 2000. Suppose, if the OP meant to use the minimum YEAR value (considering that it is numeric column), VAL/VAL[YEAR==2000] in the above code can be replaced with VAL/VAL[which.min(YEAR)].


Or you can use a similar code with dplyr. We group by 'GRP' and use mutate to create the 'VAL.IND'

 library(dplyr)
 df %>%
    group_by(GRP) %>%
    mutate(VAL.IND = VAL/VAL[YEAR==2000])

Here also, if we needed replace VAL/VAL[YEAR==2000] with VAL/VAL[which.min(YEAR)]


A base R option with split/unsplit. We split the dataset by the 'GRP' column to convert the data.frame to a list of dataframes, loop through the list output with lapply, create a new column using transform (or within) and convert the list with the added column back to a single data.frame by unsplit.

  unsplit(lapply(split(df, df$GRP), function(x) 
          transform(x, VAL.IND= VAL/VAL[YEAR==2000])), df$GRP)

Note that we can also use do.call(rbind instead of unsplit. But, I prefer unsplit to get the same row order as the original dataset.

like image 61
akrun Avatar answered Jan 21 '23 13:01

akrun