vectorially finding neighbors to data frame rows

Question

I have two data frames, below is a small sample of each:

df1 <- data.frame(a1= c(3,4), a2 = c(8, 8), a3 = c(4, 18), a4 = c(9,9), a5 = c(17, 30))

df2 <- data.frame(a1 = c(2,2,2,3,3,3,4,4,4), a2 = c(7,7,7,7,7,7,7,7,7), 
                 a3 = c(4,4,4,4,4,4,4,4,4), a4 = c(10,10,10, 10, 10, 10, 10,10,10), 
                 a5 = c(15,16,17, 15, 16, 17, 15, 16, 17))

I would like to examine, for each row of df1, whether it has "neighbors" in df2, where, by neighbors I mean observations that are different by at most 1 in each column (in absolute value). So for example, row 2 of df2 is a neighbor of row 1 in df1.

The way I currently do this is the following:

sweep(as.matrix(df2), 2, as.matrix(df1[1,]), "-")

For row 1 of df1, and I have to repeat this for each row of df1. Note that df2 and df1 do not have the same number of rows.

However, what I would really like is to avoid doing this "by row", because my data frames have many rows. Is there a way to do it vectorially?

Jim Chen · Accepted Answer

You can use split your row of df1 into a list, and then use lapply to achieve Vectorization:

my_list=lapply(as.list(data.frame(t(df1))),function(x) sweep(as.matrix(df2), 2, as.matrix(x), "-"))

each element of my_list is the computation result of each row in df1

my_list[[1]]
      a1 a2 a3 a4 a5
 [1,] -1 -1  0  1 -2
 [2,] -1 -1  0  1 -1
 [3,] -1 -1  0  1  0
 [4,]  0 -1  0  1 -2
 [5,]  0 -1  0  1 -1
 [6,]  0 -1  0  1  0
 [7,]  1 -1  0  1 -2
 [8,]  1 -1  0  1 -1
 [9,]  1 -1  0  1  0

Also, you can use parallel::mclapply which is faster than traditional lapply

chinsoon12 · Answer

Here is a possible data.table approach using non-equi joins

library(data.table)
cols <- names(df2)

#convert into data.table and add row index for clarity
setDT(df1)[, rn1 := .I]
setDT(df2)[, rn2 := .I]

#create a lower (-1) and upper (+1) bound on each column
bandsNames <- paste0(rep(cols, each=2L), "_", rep(c("lower", "upper"), length(cols)))
df2Bands <- df2[, 
    {
        ans <- do.call(cbind, lapply(.SD, function(x) outer(x, c(-1L, 1L), `+`)))
        setnames(data.table(ans), bandsNames)
    }, by=.(rn2)]

#create the non-equi join conditions
lowerLimits <- paste0(cols, "_lower<=", cols)
upperLimits <- paste0(cols, "_upper>=", cols)

#perform the non-equi join on lower and upper limits and return the count
#`:=` add a new column in df1 by reference
df1[, Count := 
        df2Bands[df1, .N, by=.EACHI, on=c(lowerLimits, upperLimits)]$N
    ]

desired output:

   a1 a2 a3 a4 a5 rn1 Count
1:  3  8  4  9 17   1     6
2:  4  8 18  9 30   2     0

If you want to find the matching rows as well:

df2Bands[df1, .(rn1=i.rn1, rn2=x.rn2), by=.EACHI, on=c(lowerLimits, upperLimits)][, 
    -(1L:length(bandsNames))]

Matched rows:

Aaron Hayman · Answer

I do not think there is a good way to fully vectorise this problem, (apply family are really just for loops in a bow tie). But you can do it on a by column basis, rather than by row. If further improvement is required the size of the problem can be reduced after each column by removing rows that can be excluded from ever matching (this will cause an indexing headache, but is relatively do-able).

My attempt is below which uses a for loop (which could be replaced by lapply). It returns a truth matrix, rows with a 1 can be matched to columns with a 1, which gives the pairing of neighbours.

col_comp = function(x,y)
{
    lx = length(x)
    ly = length(y)
    return(abs(rep(x,ly) - rep(y,each = lx) )<=1)
}

full_comp=function(df1,df2)
{
    rows1 = seq_len(nrow(df1))
    rows2 = seq_len(nrow(df2))
    M = matrix(1L, nrow=nrow(df1),ncol=nrow(df2))
    for(i in seq_len(ncol(df1)) )
    {
        matches = col_comp(df1[rows1,i],df2[rows2,i])

        M = M*matches
    }
    return(M)
}

vectorially finding neighbors to data frame rows

Tags:

dataframe

r

vectorization

Omry Atia

3 Answers

Jim Chen

chinsoon12

Aaron Hayman

Recent Activity

Donate For Us

vectorially finding neighbors to data frame rows

Tags:

dataframe

r

vectorization

Omry Atia

3 Answers

Jim Chen

chinsoon12

Aaron Hayman

Related questions

Recent Activity

Donate For Us