I have a set of time series, and I want to scale each of them relative to their value in a specific interval. That way, each series will be at 1.0 at that time and change proportionally.
I can't figure out how to do that with dplyr.
Here's a working example using a for loop:
library(dplyr)
data = expand.grid(
category = LETTERS[1:3],
year = 2000:2005)
data$value = runif(nrow(data))
# the first time point in the series
baseYear = 2002
# for each category, divide all the values by the category's value in the base year
for(category in as.character(levels(factor(data$category)))) {
data[data$category == category,]$value = data[data$category == category,]$value / data[data$category == category & data$year == baseYear,]$value[[1]]
}
Edit: Modified the question such that the base time point is not indexable. Sometimes the "time" column is actually a factor, which isn't necessarily ordinal.
This solution is very similar to @thelatemail, but I think it's sufficiently different enough to merit its own answer because it chooses the index based on a condition:
data %>%
group_by(category) %>%
mutate(value = value/value[year == baseYear])
# category year value
#... ... ... ...
#7 A 2002 1.00000000
#8 B 2002 1.00000000
#9 C 2002 1.00000000
#10 A 2003 0.86462789
#11 B 2003 1.07217943
#12 C 2003 0.82209897
(Data output has been truncated. To replicate these results, set.seed(123)
when creating data
.)
Use first
in dplyr, ensuring you use order_by
data %>%
group_by(category) %>%
mutate(value = value / first(value, order_by = year))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With