Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transform a set of columns in a data.table

Tags:

r

data.table

A data.table novice question. I would like to transform a set of columns in a data.table by applying a mathematical formula to them. The set of columns must exclude 1 or more of the total number of columns.

In data.frame terms I would do:

data(iris)
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

iris[, -5] <- iris[, -5] * 1e3
head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1         5100        3500         1400         200  setosa
2         4900        3000         1400         200  setosa
3         4700        3200         1300         200  setosa
4         4600        3100         1500         200  setosa
5         5000        3600         1400         200  setosa
6         5400        3900         1700         400  setosa

I know how to select multiple columns in a data.table:

iris.dt <- data.table(iris)
head(iris.dt[, -5, with = FALSE])

or even:

head(iris.dt[, !"Species", with = FALSE])

How to actually transform those selected columns taking advantage of data.table pass-by-reference?

like image 299
mbask Avatar asked Nov 29 '12 10:11

mbask


People also ask

How do you create a transformed data table?

Select Data > Transform data. Select the data table you want to add transformations to. In the Transformations drop-down list, select the type of transformation you want to perform on the data. Click Add to open a dialog with settings relevant for the selected transformation type.

How do I convert a Dataframe to a data table in R?

Method 1 : Using setDT() method The setDT() method can be used to coerce the dataframe or the lists into data. table, where the conversion is made to the original dataframe. The modification is made by reference to the original data structure.

How do I rearrange columns in Datatable?

To reorder data. table columns, the idiomatic way is to use setcolorder(x, neworder) , instead of doing x <- x[, neworder, with=FALSE] . This is because the latter makes an entire copy of the data. table , which maybe unnecessary in most situations.

What does setDT do in R?

setDT converts lists (both named and unnamed) and data. frames to data. tables by reference. This feature was requested on Stackoverflow.


2 Answers

What about using the .SDCols argument along with assignment by reference (:=):

DT <- data.table(iris)
DT[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")
   :=lapply(.SD, function(x) x*1000), .SDcols=1:4]
# Alternatively you can grab the names the usual way:
# DT[, names(DT)[1:4] := lapply(.SD, function(x) x*1000), .SDcols=1:4]
DT
#      Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
#   1:         5100        3500         1400         200    setosa
#   2:         4900        3000         1400         200    setosa
#   3:         4700        3200         1300         200    setosa
#   4:         4600        3100         1500         200    setosa
#   5:         5000        3600         1400         200    setosa
#  ---                                                            
# 146:         6700        3000         5200        2300 virginica
# 147:         6300        2500         5000        1900 virginica
# 148:         6500        3000         5200        2000 virginica
# 149:         6200        3400         5400        2300 virginica
# 150:         5900        3000         5100        1800 virginica
like image 170
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 21 '22 17:09

A5C1D2H2I1M1N2O1R2T1


.SDcols is the right approach, but you can specify the column names just once using a vector.

DT <- data.table(iris)
colnms <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")
DT[, (colnms) := lapply(.SD, function(x) x*1000), .SDcols = colnms]

Note that you need the parentheses to the left of := to stop data.table interpreting colnms as the name of a column.

like image 27
Jonathan Rougier Avatar answered Sep 20 '22 17:09

Jonathan Rougier