I have a large data.table in R with several columns with dollar values. In a different column I have an inflation adjustment number. I am trying to figure out how to update each of my monetary columns with it multiplied by the inflation adjustment column. Suppose I have the data:
DT <- data.table(id=1:1000,year=round(runif(1000)*10), inc1 = runif(1000), inc2 = runif(1000), inc3 = runif(1000), deflator = rnorm(1000))
which gives output:
id year inc1 inc2 inc3 deflator 1: 1 8 0.4754808 0.6678110 0.41533976 -0.64126988 2: 2 2 0.6568746 0.7765634 0.70616373 0.39687915 3: 3 6 0.8192947 0.9236281 0.90002534 -0.69545700 4: 4 4 0.7781929 0.1624902 0.17565790 0.05263055 5: 5 7 0.6232520 0.8024975 0.86449836 0.70781887 --- 996: 996 2 0.9676383 0.2238746 0.19822000 0.78564836 997: 997 9 0.9877410 0.5783748 0.57497438 -1.63365223 998: 998 8 0.2220570 0.6500632 0.19814932 1.00260174 999: 999 3 0.4793767 0.2830457 0.54835581 1.04168818 1000: 1000 8 0.2003476 0.6121637 0.02921505 0.34933690
in reality I have inc1
- inc100
, rather than just three variables and I want to figure out a way to perform this action:
DT[, inc1 := inc1 * deflator]
for each of my 100 income columns (inc1, inc2, inc3 in the fake data above). I will have more than 100 columns in the future, so I would like to figure out a way to loop the action over the columns. Is there a way to do this for all the income columns at once?
I would like to do something like:
inc_cols = c(inc1, inc2, inc3) DT[, inc_cols := lapply(inc_cols,function(x)= x * deflator),]
or
DT[, inc_cols := lapply(.SD,function(x)= x * deflator),.SDcols = inc_cols]
but neither of these seem to work. I also tried using the get()
function to make it clear deflator
is a referencing a column, like:
DT[, inc_cols := lapply(.SD,function(x)= x * get(deflator)),.SDcols = inc_cols]
but had no luck. I also tried to loop through the variables with something like:
for (var in inc_cols) { print(var) DT[, get(var) := get(var) *infAdj2010_mult] }
which returns
[1] "inc1" Error in get(var) : object 'inc1' not found
I realize this is probably a straight forward question and I have tried to search the other questions here and various online guides and tutorials, but I cannot find an example matching my specific problem. It is similar to this question, but not exactly.
Thanks for your help!
In R the asterisk (*) is used for element-wise multiplication. This is where the elements in the same row are multiplied by one another. We can see that the output of c*x and x*c are the same, and the vector x doubles matrix c. In R percent signs combined with asterisks are used for matrix multiplication (%*%).
To multiply only one column with a number, we can simply use the multiplication operator * but need to replace the original column with the new values.
To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.
You could try
DT[, (inc_cols) := lapply(.SD, function(x) x * DT[['deflator']] ), .SDcols = inc_cols] head(DT1,2) # id year inc1 inc2 inc3 deflator #1: 1 3 0.614838304 0.009796974 0.3236051 0.7735552 #2: 2 2 -0.001583579 -0.082289606 -0.1365115 -0.6644330
Or if you need a loop
for(inc in inc_cols){ nm1 <- as.symbol(inc) DT[,(inc):= eval(nm1)*deflator] } head(DT,2) # id year inc1 inc2 inc3 deflator #1: 1 3 0.614838304 0.009796974 0.3236051 0.7735552 #2: 2 2 -0.001583579 -0.082289606 -0.1365115 -0.6644330
Or a possible option using set
which should be very fast as the overhead of [.data.table
is avoided (suggested by @Arun)
indx <- grep('inc', colnames(DT)) for(j in indx){ set(DT, i=NULL, j=j, value=DT[[j]]*DT[['deflator']]) } head(DT,2) # id year inc1 inc2 inc3 deflator #1: 1 3 0.614838304 0.009796974 0.3236051 0.7735552 #2: 2 2 -0.001583579 -0.082289606 -0.1365115 -0.6644330
where
inc_cols <- grep('^inc', colnames(DT), value=TRUE)
set.seed(24) DT <- data.table(id=1:1000,year=round(runif(1000)*10), inc1 = runif(1000), inc2 = runif(1000), inc3 = runif(1000), deflator = rnorm(1000))
Since you can use dplyr on data.tables, you could also do:
library(dplyr) DT %>% mutate_each(funs(.*deflator), starts_with("inc"))
Which will multiply each column of DT that starts with "inc" by the "deflator" column.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With