Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

vectorially finding neighbors to data frame rows

I have two data frames, below is a small sample of each:

df1 <- data.frame(a1= c(3,4), a2 = c(8, 8), a3 = c(4, 18), a4 = c(9,9), a5 = c(17, 30))

df2 <- data.frame(a1 = c(2,2,2,3,3,3,4,4,4), a2 = c(7,7,7,7,7,7,7,7,7), 
                 a3 = c(4,4,4,4,4,4,4,4,4), a4 = c(10,10,10, 10, 10, 10, 10,10,10), 
                 a5 = c(15,16,17, 15, 16, 17, 15, 16, 17))

I would like to examine, for each row of df1, whether it has "neighbors" in df2, where, by neighbors I mean observations that are different by at most 1 in each column (in absolute value). So for example, row 2 of df2 is a neighbor of row 1 in df1.

The way I currently do this is the following:

sweep(as.matrix(df2), 2, as.matrix(df1[1,]), "-")

For row 1 of df1, and I have to repeat this for each row of df1. Note that df2 and df1 do not have the same number of rows.

However, what I would really like is to avoid doing this "by row", because my data frames have many rows. Is there a way to do it vectorially?

like image 826
Omry Atia Avatar asked Sep 12 '18 09:09

Omry Atia


3 Answers

You can use split your row of df1 into a list, and then use lapply to achieve Vectorization:

my_list=lapply(as.list(data.frame(t(df1))),function(x) sweep(as.matrix(df2), 2, as.matrix(x), "-"))

each element of my_list is the computation result of each row in df1

my_list[[1]]
      a1 a2 a3 a4 a5
 [1,] -1 -1  0  1 -2
 [2,] -1 -1  0  1 -1
 [3,] -1 -1  0  1  0
 [4,]  0 -1  0  1 -2
 [5,]  0 -1  0  1 -1
 [6,]  0 -1  0  1  0
 [7,]  1 -1  0  1 -2
 [8,]  1 -1  0  1 -1
 [9,]  1 -1  0  1  0

Also, you can use parallel::mclapply which is faster than traditional lapply

like image 170
Jim Chen Avatar answered Nov 13 '22 13:11

Jim Chen


Here is a possible data.table approach using non-equi joins

library(data.table)
cols <- names(df2)

#convert into data.table and add row index for clarity
setDT(df1)[, rn1 := .I]
setDT(df2)[, rn2 := .I]

#create a lower (-1) and upper (+1) bound on each column
bandsNames <- paste0(rep(cols, each=2L), "_", rep(c("lower", "upper"), length(cols)))
df2Bands <- df2[, 
    {
        ans <- do.call(cbind, lapply(.SD, function(x) outer(x, c(-1L, 1L), `+`)))
        setnames(data.table(ans), bandsNames)
    }, by=.(rn2)]

#create the non-equi join conditions
lowerLimits <- paste0(cols, "_lower<=", cols)
upperLimits <- paste0(cols, "_upper>=", cols)

#perform the non-equi join on lower and upper limits and return the count
#`:=` add a new column in df1 by reference
df1[, Count := 
        df2Bands[df1, .N, by=.EACHI, on=c(lowerLimits, upperLimits)]$N
    ]

desired output:

   a1 a2 a3 a4 a5 rn1 Count
1:  3  8  4  9 17   1     6
2:  4  8 18  9 30   2     0

If you want to find the matching rows as well:

df2Bands[df1, .(rn1=i.rn1, rn2=x.rn2), by=.EACHI, on=c(lowerLimits, upperLimits)][, 
    -(1L:length(bandsNames))]

Matched rows:

   rn1 rn2
1:   1   2
2:   1   3
3:   1   5
4:   1   6
5:   1   8
6:   1   9
7:   2  NA
like image 41
chinsoon12 Avatar answered Nov 13 '22 12:11

chinsoon12


I do not think there is a good way to fully vectorise this problem, (apply family are really just for loops in a bow tie). But you can do it on a by column basis, rather than by row. If further improvement is required the size of the problem can be reduced after each column by removing rows that can be excluded from ever matching (this will cause an indexing headache, but is relatively do-able).

My attempt is below which uses a for loop (which could be replaced by lapply). It returns a truth matrix, rows with a 1 can be matched to columns with a 1, which gives the pairing of neighbours.

col_comp = function(x,y)
{
    lx = length(x)
    ly = length(y)
    return(abs(rep(x,ly) - rep(y,each = lx) )<=1)
}

full_comp=function(df1,df2)
{
    rows1 = seq_len(nrow(df1))
    rows2 = seq_len(nrow(df2))
    M = matrix(1L, nrow=nrow(df1),ncol=nrow(df2))
    for(i in seq_len(ncol(df1)) )
    {
        matches = col_comp(df1[rows1,i],df2[rows2,i])

        M = M*matches
    }
    return(M)
}
like image 39
Aaron Hayman Avatar answered Nov 13 '22 14:11

Aaron Hayman