Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dynamically create new columns in data.table

Tags:

r

data.table

I have a data.table in R and want to create a new column. Let's say that I have the date column name saved as a variable and want to append _year to that name in the new column. I'm able to do that the normal route by just specifying the name, but how can I create the new column name using the date_col variable.

Here is what I've tried. The last two, which I want, don't work.

dat = data.table(one = 1:5, two = 1:5, 
                 order_date = lubridate::ymd("2015-01-01","2015-02-01","2015-03-01",
                           "2015-04-01","2015-05-01"))
dat
date_col = "order_date"
dat[,`:=`(OrderDate_year = substr(get(date_col)[!is.na(get(date_col))],1,4))][]
dat[,`:=`(new = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]
dat[,`:=`(paste0(date_col, "_year", sep="") = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]
dat[,`:=`(noquote(paste0(date_col, "_year", sep="")) = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]
like image 275
ATMA Avatar asked Feb 27 '26 18:02

ATMA


2 Answers

The last two statements return an error message:

dat[,`:=`(paste0(date_col, "_year", sep="") = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]
Error: unexpected '=' in "dat[,`:=`(paste0(date_col, "_year", sep="") ="
dat[,`:=`(noquote(paste0(date_col, "_year", sep="")) = substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))],1,4))][]
Error: unexpected '=' in "dat[,`:=`(noquote(paste0(date_col, "_year", sep="")) ="

The correct syntax for calling the :=() function is:

dat[, `:=`(paste0(date_col, "_year", sep = ""), 
           substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))], 1, 4))][]
dat[, `:=`(noquote(paste0(date_col, "_year", sep = "")), 
           substr(noquote(get(date_col))[!is.na(noquote(get(date_col)))], 1, 4))][]

i.e., replace = by ,.


However, assignment syntax and right hand side are far too complicated.

The order_date column is already of class Date:

str(dat)
Classes ‘data.table’ and 'data.frame':    5 obs. of  3 variables:
 $ one       : int  1 2 3 4 5
 $ two       : int  1 2 3 4 5
 $ order_date: Date, format: "2015-01-01" "2015-02-01" ...
 - attr(*, ".internal.selfref")=<externalptr>

In order to extract the year, year() function can be used (either from the data.table package or the lubridate package whatever is loaded last), so no conversion back to character and extraction of the year string is required:

date_col = "order_date"
dat[, paste0(date_col, "_year") := lapply(.SD, year), .SDcols = date_col][]
   one two order_date order_date_year
1:   1   1 2015-01-01            2015
2:   2   2 2015-02-01            2015
3:   3   3 2015-03-01            2015
4:   4   4 2015-04-01            2015
5:   5   5 2015-05-01            2015

Alternatively,

dat[, paste0(date_col, "_year") := year(get(date_col))][]
dat[, `:=`(paste0(date_col, "_year"), year(get(date_col)))][]

work as well.

like image 55
Uwe Avatar answered Mar 01 '26 08:03

Uwe


The set function is nice for doing this. Quicker than setting inside the data.table too. Is this what you are after? http://brooksandrew.github.io/simpleblog/articles/advanced-data-table/#fast-looping-with-set

library(data.table)
dat = data.table(one = 1:5, two = 1:5, 
                 order_date = lubridate::ymd("2015-01-01","2015-02-01","2015-03-01",
                           "2015-04-01","2015-05-01"))
dat
date_col = "order_date"

year_col <- paste0(date_col, "_year", sep="")
set(dat, j = year_col, value = substr(dat[[date_col]], 1, 4) )
like image 22
Jonny Phelps Avatar answered Mar 01 '26 10:03

Jonny Phelps



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!