Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a new column based on column that does not yet exist

Tags:

r

data.table

I have the below data set:

DT <- fread("   df1 df2
  1   8
  2   9
  3  10
  4  11
  5  12")

I want to create a new column df3 with first value equal to 100 and then lag(df3, 1) * (1 + df2). So the final output will be:

df1 df2     df3
1  1  8     100
2  2  9    1000
3  3 10   11000
4  4 11  132000
5  5 12 1716000

I have tried running DT[,df3 := lag(df3, 1) * (1 + df2)] but because df3does not yet exists, so I get an error.

like image 736
Maylo Avatar asked Jun 05 '18 12:06

Maylo


People also ask

How do I create a new column based on another column?

Using Pandas Apply Function The first and most practical way of adding a new column based on another is using the Pandas apply function. In the example above, we define a function that takes the current rating, divided by 10, and multiplies it by 100.

How do I create a new column in pandas based on existing column?

Create a new column by assigning the output to the DataFrame with a new column name in between the [] . Operations are element-wise, no need to loop over rows. Use rename with a dictionary or function to rename row labels or column names.

How do I get the value of a column based on another column value?

You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression. The blow example returns a Courses column where the Fee column value matches with 25000.


1 Answers

I'm leaving previous answer below as it had some success, but I had overlooked that it would be much faster with cumprod :

DT$df3 <-  100 * cumprod(c(0,DT$df2[-1])+1)        # base R
DT[, df3:= 100 * cumprod(c(0,df2[-1])+1)]          # data.table
DT %>% mutate(df3 = 100 * cumprod(c(0,df2[-1])+1)) # tidyverse (only dplyr here)

We compute the cumulated product of df2+1, ignoring the first element and starting with 1, and we multiply it by 100.


Previous answer with Reduce:

This is a good job for Reduce, the function we're using is the simple multiplication, then we make sure to :

  • add 1 to df2 and ignore the first value.
  • accumulate the results (accumulate = TRUE)

code:

DT$df3 <- Reduce(`*`,DT$df2[-1]+1,init = 100,accumulate = TRUE)
DT
#    df1 df2     df3
# 1:   1   8     100
# 2:   2   9    1000
# 3:   3  10   11000
# 4:   4  11  132000
# 5:   5  12 1716000

This works with base R, to use more idiomatic syntax with data.table one can follow @jogo's advice and write:

DT[, df3:=Reduce('*', df2[-1]+1, init = 100,accumulate = TRUE)]

And for completeness this would be the tidyverse way:

library(tidyverse)
DT %>% mutate(df3 = accumulate(df2[-1]+1,`*`,.init = 100))
like image 115
Moody_Mudskipper Avatar answered Sep 22 '22 16:09

Moody_Mudskipper