I have the below data set:
DT <- fread(" df1 df2
1 8
2 9
3 10
4 11
5 12")
I want to create a new column df3
with first value equal to 100 and then lag(df3, 1) * (1 + df2)
. So the final output will be:
df1 df2 df3
1 1 8 100
2 2 9 1000
3 3 10 11000
4 4 11 132000
5 5 12 1716000
I have tried running DT[,df3 := lag(df3, 1) * (1 + df2)]
but because df3
does not yet exists, so I get an error.
Using Pandas Apply Function The first and most practical way of adding a new column based on another is using the Pandas apply function. In the example above, we define a function that takes the current rating, divided by 10, and multiplies it by 100.
Create a new column by assigning the output to the DataFrame with a new column name in between the [] . Operations are element-wise, no need to loop over rows. Use rename with a dictionary or function to rename row labels or column names.
You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression. The blow example returns a Courses column where the Fee column value matches with 25000.
I'm leaving previous answer below as it had some success, but I had overlooked that it would be much faster with cumprod
:
DT$df3 <- 100 * cumprod(c(0,DT$df2[-1])+1) # base R
DT[, df3:= 100 * cumprod(c(0,df2[-1])+1)] # data.table
DT %>% mutate(df3 = 100 * cumprod(c(0,df2[-1])+1)) # tidyverse (only dplyr here)
We compute the cumulated product of df2+1
, ignoring the first element and starting with 1
, and we multiply it by 100
.
Previous answer with Reduce
:
This is a good job for Reduce
, the function we're using is the simple multiplication, then we make sure to :
1
to df2
and ignore the first value. accumulate = TRUE
)code:
DT$df3 <- Reduce(`*`,DT$df2[-1]+1,init = 100,accumulate = TRUE)
DT
# df1 df2 df3
# 1: 1 8 100
# 2: 2 9 1000
# 3: 3 10 11000
# 4: 4 11 132000
# 5: 5 12 1716000
This works with base R
, to use more idiomatic syntax with data.table
one can follow @jogo's advice and write:
DT[, df3:=Reduce('*', df2[-1]+1, init = 100,accumulate = TRUE)]
And for completeness this would be the tidyverse
way:
library(tidyverse)
DT %>% mutate(df3 = accumulate(df2[-1]+1,`*`,.init = 100))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With