Using data.table to calculate a function which depends on many columns

Question

There are many posts which discuss applying a function over many columns when using data.table. However I need to calculate a function which depends on many columns. As an example:

# Create a data table with 26 columns.  Variable names are var1, ..., var 26
data.mat = matrix(sample(letters, 26*26, replace=TRUE),ncol=26)
colnames(data.mat) = paste("var",1:26,sep="")
data.dt <- data.table(data.mat)

Now, say I would like to count the number of 'a's in columns 5,6,7 and 8. I cannot see how to do this with SDcols and end up doing:

data.dt[,numberOfAs := (var5=='a')+(var6=='a')+(var7=='a')+(var7=='a')]

Which is very tedious. Is there a more sensible way to do this?

Thanks

Arun · Accepted Answer

I really suggest going through the vignettes linked here. Section 2e from the Introduction to data.table vignette explains .SD and .SDcols.

.SD is just a data.table containing the data for current group. And .SDcols tells the columns .SD should have. A useful way is to use print to see the content.

# .SD contains cols 5:8
data.dt[, print(.SD), .SDcols=5:8]

Since there is no by here, .SD contains all the rows of data.dt, corresponding to the columns specified in .SDcols.

Once you understand this, the task reduces to your knowledge of base R really. You can accomplish this in more than one way.

data.dt[, numberOfAs := rowSums(.SD == "a"), .SDcols=5:8]

We return a logical matrix by comparing all the columns in .SD to "a". And then use rowSums to sum them up.

Another way using Reduce:

data.dt[, numberOfAs := Reduce(`+`, lapply(.SD, function(x) x == "a")), .SDcols=5:8]

Using data.table to calculate a function which depends on many columns

Tags:

r

data.table

Alan Chalk

1 Answers

Arun

Recent Activity

Donate For Us

Using data.table to calculate a function which depends on many columns

Tags:

r

data.table

Alan Chalk

1 Answers

Arun

Related questions

Recent Activity

Donate For Us