Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Update values in specified columns based on external data table

I want to update the values in some specified columns of a data table based on external values from another data table.

I know how to do this variable by variable, but I wanted a more efficient solution that I can automate, perhaps using lapply

UPDATE: My data set (equivalent to mtcars in the example below) have other columns which I don't want to update.

For a reproducible example, run this snippet of code first

# Load library
  library(data.table)

# load data
  data(mtcars)

# create another Data Table which we'll use as external reference
# this table shows the number of decimals of those variables we want to modify in mtcars
  var <- c("mpg","disp","hp")
  Decimal <- c(1,3,2)
  DT <- cbind.data.frame(var, Decimal)

# convert into data.table
  setDT(DT)
  setDT(mtcars)

My code, updating column by column

mtcars[, mpg  := mpg  / 10 ^ DT[var=="mpg",  Decimal] ]
mtcars[, disp := disp / 10 ^ DT[var=="disp", Decimal] ]
mtcars[, hp   := hp   / 10 ^ DT[var=="hp",   Decimal] ]

This code works fine and it gives the desired result.

Desired Result

The first row of mtcars used to look like this:

>     mpg disp  hp
> 1: 21.0  160 110

and now it looks like this:

>     mpg   disp   hp
> 1: 2.10  0.160 1.10

Is there a more efficient solution using function , lapply etc?

like image 900
rafa.pereira Avatar asked Feb 08 '23 22:02

rafa.pereira


1 Answers

We can also use set for multiple columns. It is very efficient as the overhead of [.data.table is avoided. We loop over the column index of 'mtcars', and use set to change the columns specified by 'j' with the value from the calculation of corresponding 'mtcars' columns with 'DT$Decimal' elements.

for(j in seq_along(mtcars)){
  set(mtcars, i=NULL, j=j, value=mtcars[[j]]/10^DT[,Decimal][j])
 }


head(mtcars)
#    mpg  disp   hp
#1: 2.10 0.160 1.10
#2: 2.10 0.160 1.10
#3: 2.28 0.108 0.93
#4: 2.14 0.258 1.10
#5: 1.87 0.360 1.75
#6: 1.81 0.225 1.05

EDIT: Based on the OP's comments, suppose if we are not subsetting the dataset and want to keep all the columns, while transforming some columns specified in the 'var', we can loop over the 'var' and use set to change the columns specified by the 'var'. Here, I am using the full mtcars dataset after converting to data.table.

 for(j in seq_along(var)){
  set(mtcars, i=NULL, j=var[j], value=mtcars[[var[j]]]/10^DT[, Decimal][j])
 }

head(mtcars)
#    mpg cyl  disp   hp drat    wt  qsec vs am gear carb
#1: 2.10   6 0.160 1.10 3.90 2.620 16.46  0  1    4    4
#2: 2.10   6 0.160 1.10 3.90 2.875 17.02  0  1    4    4
#3: 2.28   4 0.108 0.93 3.85 2.320 18.61  1  1    4    1
#4: 2.14   6 0.258 1.10 3.08 3.215 19.44  1  0    3    1
#5: 1.87   8 0.360 1.75 3.15 3.440 17.02  0  0    3    2
#6: 1.81   6 0.225 1.05 2.76 3.460 20.22  1  0    3    1
like image 78
akrun Avatar answered Feb 11 '23 21:02

akrun