Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using function in lapply in data.table in r

If there is a sample data set as below.

> tmp <- data.table(x=c(1:10),y=(5:14))
> tmp
     x  y
 1:  1  5
 2:  2  6
 3:  3  7
 4:  4  8
 5:  5  9
 6:  6 10
 7:  7 11
 8:  8 12
 9:  9 13
10: 10 14

I want choose two lowest number and I want change 0 value to the other numbers.

like

   x y
 1: 1 5
 2: 2 6
 3: 0 0
 4: 0 0
 5: 0 0
 6: 0 0
 7: 0 0
 8: 0 0
 9: 0 0
10: 0 0

I think the coding is

tmp[, c("x","y"):=lapply(.SD, x[which(!x %in% sort(x)[1:2])] = 0}), .SDcols=c("x","y")]

but it changes all 0

How can i solve this problem.

like image 407
Rokmc1050 Avatar asked Sep 29 '22 05:09

Rokmc1050


1 Answers

To expand on my comment, I'd do something like this:

for (j in names(tmp)) {
    col = tmp[[j]]
    min_2 = sort.int(unique(col), partial=2L)[2L] # 2nd lowest value
    set(tmp, i = which(col > min_2), j = j, value = 0L)
}

This loops over all the columns in tmp, and gets the 2nd minimum value for each column using sort.int with partial argument, which is slightly more efficient than using sort (as we don't have to sort the entire data set to find the 2nd minimum value).

Then we use set() to replace those rows where the column value is greater than the 2nd minimum value, for that column, with the value 0.

like image 101
Arun Avatar answered Oct 06 '22 02:10

Arun