A data.table
novice question.
I would like to transform a set of columns in a data.table
by applying a mathematical formula to them. The set of columns must exclude 1 or more of the total number of columns.
In data.frame
terms I would do:
data(iris)
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
iris[, -5] <- iris[, -5] * 1e3
head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5100 3500 1400 200 setosa
2 4900 3000 1400 200 setosa
3 4700 3200 1300 200 setosa
4 4600 3100 1500 200 setosa
5 5000 3600 1400 200 setosa
6 5400 3900 1700 400 setosa
I know how to select multiple columns in a data.table
:
iris.dt <- data.table(iris)
head(iris.dt[, -5, with = FALSE])
or even:
head(iris.dt[, !"Species", with = FALSE])
How to actually transform those selected columns taking advantage of data.table
pass-by-reference?
Select Data > Transform data. Select the data table you want to add transformations to. In the Transformations drop-down list, select the type of transformation you want to perform on the data. Click Add to open a dialog with settings relevant for the selected transformation type.
Method 1 : Using setDT() method The setDT() method can be used to coerce the dataframe or the lists into data. table, where the conversion is made to the original dataframe. The modification is made by reference to the original data structure.
To reorder data. table columns, the idiomatic way is to use setcolorder(x, neworder) , instead of doing x <- x[, neworder, with=FALSE] . This is because the latter makes an entire copy of the data. table , which maybe unnecessary in most situations.
setDT converts lists (both named and unnamed) and data. frames to data. tables by reference. This feature was requested on Stackoverflow.
What about using the .SDCols
argument along with assignment by reference (:=
):
DT <- data.table(iris)
DT[, c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")
:=lapply(.SD, function(x) x*1000), .SDcols=1:4]
# Alternatively you can grab the names the usual way:
# DT[, names(DT)[1:4] := lapply(.SD, function(x) x*1000), .SDcols=1:4]
DT
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# 1: 5100 3500 1400 200 setosa
# 2: 4900 3000 1400 200 setosa
# 3: 4700 3200 1300 200 setosa
# 4: 4600 3100 1500 200 setosa
# 5: 5000 3600 1400 200 setosa
# ---
# 146: 6700 3000 5200 2300 virginica
# 147: 6300 2500 5000 1900 virginica
# 148: 6500 3000 5200 2000 virginica
# 149: 6200 3400 5400 2300 virginica
# 150: 5900 3000 5100 1800 virginica
.SDcols
is the right approach, but you can specify the column names just once using a vector.
DT <- data.table(iris)
colnms <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")
DT[, (colnms) := lapply(.SD, function(x) x*1000), .SDcols = colnms]
Note that you need the parentheses to the left of :=
to stop data.table
interpreting colnms
as the name of a column.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With