Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiply many columns by a specific other column in R with data.table?

Tags:

r

data.table

I have a large data.table in R with several columns with dollar values. In a different column I have an inflation adjustment number. I am trying to figure out how to update each of my monetary columns with it multiplied by the inflation adjustment column. Suppose I have the data:

   DT <- data.table(id=1:1000,year=round(runif(1000)*10),            inc1 = runif(1000), inc2 = runif(1000), inc3 = runif(1000),               deflator = rnorm(1000)) 

which gives output:

             id year      inc1      inc2       inc3    deflator    1:    1    8 0.4754808 0.6678110 0.41533976 -0.64126988    2:    2    2 0.6568746 0.7765634 0.70616373  0.39687915    3:    3    6 0.8192947 0.9236281 0.90002534 -0.69545700    4:    4    4 0.7781929 0.1624902 0.17565790  0.05263055    5:    5    7 0.6232520 0.8024975 0.86449836  0.70781887   ---                                                       996:  996    2 0.9676383 0.2238746 0.19822000  0.78564836  997:  997    9 0.9877410 0.5783748 0.57497438 -1.63365223  998:  998    8 0.2220570 0.6500632 0.19814932  1.00260174  999:  999    3 0.4793767 0.2830457 0.54835581  1.04168818 1000: 1000    8 0.2003476 0.6121637 0.02921505  0.34933690 

in reality I have inc1 - inc100, rather than just three variables and I want to figure out a way to perform this action:

DT[, inc1 := inc1 * deflator] 

for each of my 100 income columns (inc1, inc2, inc3 in the fake data above). I will have more than 100 columns in the future, so I would like to figure out a way to loop the action over the columns. Is there a way to do this for all the income columns at once?

I would like to do something like:

inc_cols = c(inc1, inc2, inc3)  DT[, inc_cols := lapply(inc_cols,function(x)= x * deflator),] 

or

DT[, inc_cols := lapply(.SD,function(x)= x * deflator),.SDcols = inc_cols] 

but neither of these seem to work. I also tried using the get() function to make it clear deflator is a referencing a column, like:

DT[, inc_cols := lapply(.SD,function(x)= x * get(deflator)),.SDcols = inc_cols] 

but had no luck. I also tried to loop through the variables with something like:

for (var in inc_cols) {   print(var)   DT[, get(var) := get(var) *infAdj2010_mult]  } 

which returns

[1] "inc1"  Error in get(var) : object 'inc1' not found  

I realize this is probably a straight forward question and I have tried to search the other questions here and various online guides and tutorials, but I cannot find an example matching my specific problem. It is similar to this question, but not exactly.

Thanks for your help!

like image 750
johneric Avatar asked Jan 24 '15 06:01

johneric


People also ask

How do I multiply certain rows in R?

In R the asterisk (*) is used for element-wise multiplication. This is where the elements in the same row are multiplied by one another. We can see that the output of c*x and x*c are the same, and the vector x doubles matrix c. In R percent signs combined with asterisks are used for matrix multiplication (%*%).

How do I multiply a column by another column in R?

To multiply only one column with a number, we can simply use the multiplication operator * but need to replace the original column with the new values.

How do I select multiple columns in a table in R?

To pick out single or multiple columns use the select() function. The select() function expects a dataframe as it's first input ('argument', in R language), followed by the names of the columns you want to extract with a comma between each name.


2 Answers

You could try

DT[, (inc_cols) := lapply(.SD, function(x)          x * DT[['deflator']] ), .SDcols = inc_cols] head(DT1,2) #   id year         inc1         inc2       inc3   deflator #1:  1    3  0.614838304  0.009796974  0.3236051  0.7735552 #2:  2    2 -0.001583579 -0.082289606 -0.1365115 -0.6644330 

Or if you need a loop

for(inc in inc_cols){   nm1 <- as.symbol(inc)   DT[,(inc):= eval(nm1)*deflator] }   head(DT,2)  #  id year         inc1         inc2       inc3   deflator  #1:  1    3  0.614838304  0.009796974  0.3236051  0.7735552  #2:  2    2 -0.001583579 -0.082289606 -0.1365115 -0.6644330 

Or a possible option using set which should be very fast as the overhead of [.data.table is avoided (suggested by @Arun)

indx <- grep('inc', colnames(DT))  for(j in indx){  set(DT, i=NULL, j=j, value=DT[[j]]*DT[['deflator']]) } head(DT,2) #  id year         inc1         inc2       inc3   deflator #1:  1    3  0.614838304  0.009796974  0.3236051  0.7735552 #2:  2    2 -0.001583579 -0.082289606 -0.1365115 -0.6644330 

where

inc_cols <-  grep('^inc', colnames(DT), value=TRUE) 

data

set.seed(24) DT <- data.table(id=1:1000,year=round(runif(1000)*10),        inc1 = runif(1000), inc2 = runif(1000), inc3 = runif(1000),           deflator = rnorm(1000))  
like image 111
akrun Avatar answered Sep 18 '22 18:09

akrun


Since you can use dplyr on data.tables, you could also do:

library(dplyr) DT %>% mutate_each(funs(.*deflator), starts_with("inc")) 

Which will multiply each column of DT that starts with "inc" by the "deflator" column.

like image 43
talat Avatar answered Sep 19 '22 18:09

talat