I have a dataset that's columns are like this:
Forecaster CountryName ProductName X2022 X2023 file_month
Allianz USA Inflation 6.1 2.5 7
EIU USA Inflation 7.7 3.7 7
Oxford USA Inflation 6.5 2.9 7
Allianz USA Inflation NA 3.3 8
EIU USA Inflation 8.3 3.7 8
Data is in long format, file_month shows the month that the data was made. Now i need to create new column that refers to updated values by Forecaster, CountryName, ProductName and either X2022 or X2023. So in this new column it should be checked if Forecaster revised it's forecast compared to one month before and if so, the new revised value should be written in the column.
Also, as there is more than one ProductName, CountryName and Forecaster in whole dataset, I would like to know how could i make this work for every of these groups.
New data would need to look something like this
Forecaster CountryName ProductName X2022 X2023 file_month update_2022 update_2023
Allianz USA Inflation 6.1 2.5 7
EIU USA Inflation 7.7 3.7 7
Oxford USA Inflation 6.5 2.9 7
Allianz USA Inflation NA 3.3 8 NA 3.3
EIU USA Inflation 8.3 3.7 8 8.3 NA
For code reproducability i add dput() output.
data <- structure(list(Forecaster = c("Moody's Analytics", "Oxford Economics",
"Julius Baer", "Fitch Solutions", "SEB", "Allianz", "EIU", "FrontierView",
"Euromonitor Int.", "Fitch Ratings", "Scope Ratings", "Allianz",
"EIU", "Euromonitor Int.", "Fitch Ratings", "Fitch Solutions",
"FrontierView", "Julius Baer", "Moody's Analytics", "Oxford Economics",
"Scope Ratings", "SEB", "Allianz", "EIU", "Euromonitor Int.",
"Fitch Ratings", "Fitch Solutions", "FrontierView", "Julius Baer",
"Moody's Analytics", "Oxford Economics", "Scope Ratings", "SEB",
"Allianz", "EIU", "Euromonitor Int.", "Fitch Ratings", "Fitch Solutions",
"FrontierView", "Julius Baer", "Moody's Analytics", "Oxford Economics",
"Scope Ratings", "SEB", "Allianz", "EIU", "Euromonitor Int.",
"Fitch Ratings", "Fitch Solutions", "FrontierView", "Julius Baer",
"Moody's Analytics", "Oxford Economics", "Scope Ratings"),
CountryName = c("USA",
"USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA",
"USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA",
"USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA",
"USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA",
"USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA",
"USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA"),
ProductName = c("Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation"),
X2022 = c(6.58180388987726,
7.39615, 5.71530516215215, 6.5, NA, 6.1, 7.2, 6.2, 7.31, 6.9,
NA, 6.1, 7.7, 7.49, 6.9, 6.5, 6.2, 7.01535416655227, 6.94177167480419,
7.40344, NA, 7.5, 6.1, 7.7, 7.49, 6.9, 6.5, 6.2, 7.01535416655227,
6.94177167480419, 7.40344, NA, 7.5, NA, 8.3, 7.99, 7.8313, 7.3,
8.4, 7.45242025035952, 7.51681072158563, 8.04052, NA, 7.5, 8.2,
8.1, 8.01, 7.8313, 7.7, 8.6, 7.94553624147252, 7.82785941572661,
8.11193, NA),
X2023 = c(2.62716826359282, 1.73992, 2.32590118928029,
2.5, NA, 2.5, 2.8, 3.3, 4.02, 2.7, NA, 2.5, 3.7, 4.01, 2.7, 2.5,
3.3, 2.96909559451206, 2.70488419782626, 2.01984, NA, 4.2, 2.5,
3.7, 4.01, 2.7, 2.5, 3.3, 2.96909559451206, 2.70488419782626,
2.01984, NA, 4.2, NA, 3.7, 4.21, 3.7199, 3.5, 4.9, 3.02315479834998,
3.10232274684858, 3.17401, NA, 4.2, 3.2, 3.8, 4, 3.7199, 3.7,
5.7, 3.48906158848119, 3.36435085973137, 3.27724, NA),
file_month = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L)), class = "data.frame", row.names = c(NA,-54L))
Here are some notes about how things work here:
dplyr::group_split function that creates a list where each elements is your data set with a unique Forecaster value.purrr::map_dfr to apply a function on each elements of the previous list. In a way that we create a custom function and apply it on each data frame. dfr suffice implies that in the end the map binds all the modified data frames together by row.zoo, runner, slider and etc. I used slider because it works well with tidyverse functions.slider::slide_dbl function where it takes a given function and apply it on a sliding window of the size that we choose. Here we would like to compare each value corresponding to a month to its previous value so the size of our sliding window is 2 as specified by .before argument.if, else clause to apply what you have in mind..complete argument in slider_dbl to TRUE, so that the function is only applied on a complete window. It means for the first month that appears in the data set there is no output value cause there is no previous month.slider_dbl function on 2 columns of each data frame so I used it inside dplyer::across.library(tidyverse)
library(slider)
data %>%
group_split(Forecaster) %>%
map_dfr(~ .x %>%
mutate(across(c(X2022, X2023), ~ slide_dbl(., .f = ~ if(is.na(.x[2])) {
.x[2]
} else if(is.na(.x[1]) & !is.na(.x[2])){
.x[2]
} else if((.x[1] - .x[2]) != 0) {
.x[2]
} else {
NA_real_
}, .before = 1, .complete = TRUE), .names = 'Updated_{.col}')))
# A tibble: 54 × 8
Forecaster CountryName ProductName X2022 X2023 file_month Updated_X2022 Updated_X2023
<chr> <chr> <chr> <dbl> <dbl> <int> <dbl> <dbl>
1 Allianz USA Inflation 6.1 2.5 5 NA NA
2 Allianz USA Inflation 6.1 2.5 6 NA NA
3 Allianz USA Inflation 6.1 2.5 7 NA NA
4 Allianz USA Inflation NA NA 8 NA NA
5 Allianz USA Inflation 8.2 3.2 9 8.2 3.2
6 EIU USA Inflation 7.2 2.8 5 NA NA
7 EIU USA Inflation 7.7 3.7 6 7.7 3.7
8 EIU USA Inflation 7.7 3.7 7 NA NA
9 EIU USA Inflation 8.3 3.7 8 8.3 NA
10 EIU USA Inflation 8.1 3.8 9 8.1 3.8
# … with 44 more rows
I suggest you run this code line by line to check the output and get a sense of what is going on here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With