I'm working with 2 dataframes, as a sample:
DF1 is the main table with a column containing equations which usually have variables
DF1 <- read.table(text =
"Unit Year Equation
1 2020 'x+2*y'
1 2021 'x+2*y'
1 2022 'x+2*y'
2 2020 'x'
3 2020 'max(y^2, y+2*z)'
3 2021 'max(y^2, y+2*z)'
4 2020 '5'
5 2020 '(x/y)+z'",
header = TRUE, stringsAsFactors = FALSE)
DF2 is the reference or lookup table which assigns Values to the Variables given certain Years
DF2 <- read.table(text =
"Year Variable Value
2020 x 10
2021 x 15.5
2022 x 50
2020 y 1
2021 y 2
2022 y 3.5
2020 z 20
2021 z 34
2022 z 11",
header = TRUE, stringsAsFactors = FALSE)
The goal is to match the variables and the years between the 2 dataframes so that the following table could be derived after applying eval(parse(text=Equation)) or anything similar:
Unit Year Equation
1 2020 12
1 2021 19.5
1 2022 57
2 2020 10
3 2020 41
3 2021 70
4 2020 5
5 2020 30
Currently I'm using a for and if-else loop to match the Years and replace the Variables row-by-row. It works okay, but running it has become very slow since DF1 could contain thousands of rows with several variables. Are there other functions I could use to achieve the same output?
Edit - Adding in the loop mentioned to help with comparison:
library(dplyr)
library(reshape2)
DF2 = dcast(DF2, Year~Variable, value.var='Value')
#Adding in this line to avoid replacing "x" in "max":
DF1$Equation = gsub("max","placeholder",DF1$Equation)
for(i in 1:nrow(DF1)) {
for (j in 1:nrow(DF2)) {
if (DF1[i,]$Year==DF2[j,]$Year) {
#Every variable would be declared here:
DF1[i,]$Equation = gsub("x",DF2[j,]$x,DF1[i,]$Equation)
DF1[i,]$Equation = gsub("y",DF2[j,]$y,DF1[i,]$Equation)
DF1[i,]$Equation = gsub("z",DF2[j,]$z,DF1[i,]$Equation)
}
}
}
#Returning the function:
DF1$Equation = gsub("placeholder","max",DF1$Equation)
Results_DF1 = DF1 %>% rowwise() %>%
mutate(Equation = eval(parse(text=Equation)))
I just noticed you have edited your DF1
so I used it instead and there is no need for edit from my part:
library(dplyr)
library(rlang)
DF1 %>%
left_join(DF2 %>%
pivot_wider(names_from = Variable, values_from = Value),
by = "Year") %>%
rowwise() %>%
mutate(Result = eval(parse_expr(Equation)))
# A tibble: 8 x 7
# Rowwise:
Unit Year Equation x y z Result
<int> <int> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 2020 x+2*y 10 1 20 12
2 1 2021 x+2*y 15.5 2 34 19.5
3 1 2022 x+2*y 50 3.5 11 57
4 2 2020 x 10 1 20 10
5 3 2020 max(y^2, y+2*z) 10 1 20 41
6 3 2021 max(y^2, y+2*z) 15.5 2 34 70
7 4 2020 5 10 1 20 5
8 5 2020 (x/y)+z 10 1 20 30
You could do:
left_join(DF1, DF2, 'Year') %>%
pivot_wider(c(Unit,Year,Equation),Variable, values_from = Value) %>%
rowwise() %>%
mutate(a = eval(parse(text = Equation)))
Unit Year Equation x y z a
<int> <int> <chr> <dbl> <dbl> <dbl> <dbl>
1 1 2020 x+2*y 10 1 20 12
2 1 2021 x+2*y 15.5 2 34 19.5
3 1 2022 x+2*y 50 3.5 11 57
4 2 2020 x 10 1 20 10
5 3 2020 max(y^2, y+2*z) 10 1 20 41
6 3 2021 max(y^2, y+2*z) 15.5 2 34 70
7 4 2020 5 10 1 20 5
8 5 2020 (x/y)+z 10 1 20 30
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With