Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply a function to every specified column in a data.table and update by reference

Tags:

r

data.table

I have a data.table with which I'd like to perform the same operation on certain columns. The names of these columns are given in a character vector. In this particular example, I'd like to multiply all of these columns by -1.

Some toy data and a vector specifying relevant columns:

library(data.table) dt <- data.table(a = 1:3, b = 1:3, d = 1:3) cols <- c("a", "b") 

Right now I'm doing it this way, looping over the character vector:

for (col in 1:length(cols)) {    dt[ , eval(parse(text = paste0(cols[col], ":=-1*", cols[col])))] } 

Is there a way to do this directly without the for loop?

like image 462
Dean MacGregor Avatar asked May 30 '13 21:05

Dean MacGregor


People also ask

How do I apply a function to every row in R?

You can use the apply() function to apply a function to each row in a matrix or data frame in R. where: X: Name of the matrix or data frame. MARGIN: Dimension to perform operation across.

What is the table function in R?

table() function in R Language is used to create a categorical representation of data with variable name and the frequency in the form of a table. Syntax: table(x) Parameters: x: Object to be converted.

How do you use tables in R?

table in R Programming Language. For applying a function to each row of the given data. table, the user needs to call the apply() function which is the base function of R programming language, and pass the required parameter to this function to be applied in each row of the given data. table in R language.


1 Answers

This seems to work:

dt[ , (cols) := lapply(.SD, "*", -1), .SDcols = cols] 

The result is

    a  b d 1: -1 -1 1 2: -2 -2 2 3: -3 -3 3 

There are a few tricks here:

  • Because there are parentheses in (cols) :=, the result is assigned to the columns specified in cols, instead of to some new variable named "cols".
  • .SDcols tells the call that we're only looking at those columns, and allows us to use .SD, the Subset of the Data associated with those columns.
  • lapply(.SD, ...) operates on .SD, which is a list of columns (like all data.frames and data.tables). lapply returns a list, so in the end j looks like cols := list(...).

EDIT: Here's another way that is probably faster, as @Arun mentioned:

for (j in cols) set(dt, j = j, value = -dt[[j]]) 
like image 171
Frank Avatar answered Sep 28 '22 14:09

Frank