Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scale relative to a value in each group (via dplyr)

Tags:

r

dplyr

I have a set of time series, and I want to scale each of them relative to their value in a specific interval. That way, each series will be at 1.0 at that time and change proportionally.

I can't figure out how to do that with dplyr.

Here's a working example using a for loop:

library(dplyr)

data = expand.grid(
  category = LETTERS[1:3],
  year = 2000:2005)
data$value = runif(nrow(data))

# the first time point in the series
baseYear = 2002

# for each category, divide all the values by the category's value in the base year
for(category in as.character(levels(factor(data$category)))) {
  data[data$category == category,]$value = data[data$category == category,]$value / data[data$category == category & data$year == baseYear,]$value[[1]]
}

Edit: Modified the question such that the base time point is not indexable. Sometimes the "time" column is actually a factor, which isn't necessarily ordinal.

like image 754
sharoz Avatar asked Nov 25 '14 01:11

sharoz


2 Answers

This solution is very similar to @thelatemail, but I think it's sufficiently different enough to merit its own answer because it chooses the index based on a condition:

data %>%
    group_by(category) %>%
    mutate(value = value/value[year == baseYear])

#   category  year      value
#...     ...   ...       ...
#7         A  2002 1.00000000
#8         B  2002 1.00000000
#9         C  2002 1.00000000
#10        A  2003 0.86462789
#11        B  2003 1.07217943
#12        C  2003 0.82209897

(Data output has been truncated. To replicate these results, set.seed(123) when creating data.)

like image 114
oshun Avatar answered Oct 21 '22 13:10

oshun


Use first in dplyr, ensuring you use order_by

data %>% 
  group_by(category) %>% 
  mutate(value = value / first(value, order_by = year))
like image 39
Hugh Avatar answered Oct 21 '22 12:10

Hugh