Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I reference a column in lapply which is not part of the SD?

Tags:

r

data.table

I have a column in my data.table which contains the data I'd like to use to update a bunch of other columns. This data is a list, and I need to subset the list based on the value in each of the columns that I'll include in my SD expression

My data....

dt <- data.table( A = list( c("X","Y") , c("J","K") ) , B = c(1,2) , C = c(2,1) )
#     A B C
#1: X,Y 1 2
#2: J,K 2 1

My desired result....

#     A B C
#1: X,Y X Y
#2: J,K K J

What I tried....

# Column A is not included in SD so not found...
dt[ , lapply( .SD , function(x) A[x] ) , .SDcols = 2:3 ]
#Error in FUN(X[[1L]], ...) : object 'A' not found


# This also does not work. See's all of A as one long vector (look at results for C)
for( i in 2:3 ) dt[ , names(dt)[i] := unlist(A)[ get(names(dt)[i]) ] ]
#     A B C
#1: X,Y X Y
#2: J,K Y X

# I saw this in another answer, but also won't work:
# Basically we add an ID column and use 'by=' to try and solve the problem  above
# Now we get a type mismatch
dt <- data.table( ID = 1:2 , A = list( c("X","Y") , c("J","K") ) , B = c(1,2) , C = c(2,1) , key = "ID" )
for( i in 3:4 ) dt[ , names(dt)[i] := unlist(A)[ get(names(dt)[i]) ] , by = ID ]
#Error in `[.data.table`(dt, , `:=`(names(dt)[i], unlist(A)[get(names(dt)[i])]),  : 
#  Type of RHS ('character') must match LHS ('double'). To check and coerce would impact performance too much for the fastest cases. Either change the type of the target column, or coerce the RHS of := yourself (e.g. by using 1L instead of 1)

If anyone is interested my real data is a set of SNPs and INDELS across different isolates and I am trying to do this:

# My real data looks more like this:
# In columns V10:V15;
# if '.' in first character then use data from 'Ref' column
# else use integer at first character to subset list in 'Alt' column
#   Contig  Pos V3 Ref Alt    Qual        V10       V11       V12       V13       V14       V15
#1:     1   172  .   T   C 81.0000  1/1:.:.:. ./.:.:.:. ./.:.:.:. ./.:.:.:. ./.:.:.:. ./.:.:.:.
#2:     1   399  .   G C,A 51.0000  ./.:.:.:. 1/1:.:.:. 2/2:.:.:. ./.:.:.:. 1/1:.:.:. ./.:.:.:.
#3:     1   516  .   T   G 57.0000  ./.:.:.:. 1/1:.:.:. ./.:.:.:. 1/1:.:.:. ./.:.:.:. ./.:.:.:.
like image 536
Simon O'Hanlon Avatar asked Jun 17 '14 11:06

Simon O'Hanlon


2 Answers

You can use mapply and set with a for loop. There may be more efficient ways

for(j in c('B','C')){
    set(dt, j = j, value = mapply(FUN = '[', dt[['A']],dt[[j]]))
}
 dt
#      A B C
# 1: X,Y X Y
# 2: J,K K J
like image 121
mnel Avatar answered Sep 28 '22 17:09

mnel


Hi does this work for you ?

dt$B <- apply(dt, 1, FUN = function(x) x$A[x$B])
dt$C <- apply(dt, 1, FUN = function(x) x$A[x$C])
dt
#     A B C
#1: X,Y X Y
#2: J,K K J
like image 32
Victorp Avatar answered Sep 28 '22 18:09

Victorp