Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find immediate neighbors by group using data table or igraph

I have a data.table:

groups <- data.table(group = c("A", "B", "C", "D", "E", "F", "G"), 
                     code_1 = c(2,2,2,7,8,NA,5),
                     code_2 = c(NA,3,NA,3,NA,NA,2),
                     code_3 = c(4,1,1,4,4,1,8))

group code_1 code_2 code_3
  A      2     NA      4
  B      2      3      1
  C      2     NA      1
  D      7      3      4
  E      8     NA      4
  F     NA     NA      1
  G      5      2      8

What I would like to achieve, is for each group to find the immediate neighbors based on the available codes. For example: Group A has immediate neighbors groups B, C due to code_1 (code_1 is equal to 2 in all groups) and has immediate neighbor groups D,E due to code_3 (code_3 is equal to 4 in all those groups).

What I tried is for each code, subsetting the first column (group) based on the matches as follows:

groups$code_1_match = list()
for (row in 1:nrow(groups)){

  set(groups, i=row, j="code_1_match", list(groups$group[groups$code_1[row] == groups$code_1]))
}

  group code_1 code_2 code_3          code_1_match
    A      2     NA      4              A,B,C,NA
    B      2      3      1              A,B,C,NA
    C      2     NA      1              A,B,C,NA
    D      7      3      4                  D,NA
    E      8     NA      4                  E,NA
    F     NA     NA      1 NA,NA,NA,NA,NA,NA,...
    G      5      2      8                  NA,G

This "kinda" works but I would assume there is a more data table kind of way of doing this. I tried

groups[, code_1_match_2 := list(group[code_1 == groups$code_1])]

But this doesn't work.

Am I missing some obvious data table trick to deal with it?

My ideal case result would look like this (which currently would require using my method for all 3 columns and then concatenating the results):

group code_1 code_2 code_3    Immediate neighbors
  A      2     NA      4         B,C,D,E
  B      2      3      1         A,C,D,F
  C      2     NA      1         A,B,F
  D      7      3      4           B,A
  E      8     NA      4           A,D
  F     NA     NA      1           B,C
  G      5      2      8           
like image 795
User2321 Avatar asked Feb 03 '20 10:02

User2321


People also ask

How do I get the index of the neighbors?

To get the index of the neighbors do something like: (remember there is a mode argument) To do what Gabor suggests you need to do the following. (I think he missed V ( ) and wrote graph$name when he meant V (graph)$name in his post) This gives the labels of the corresponding "neighbors".

What is the exact nearest neighbor to the nearest neighbor?

The exact nearest neighbor might be across the boundary to one of the neighboring cells. The intuition of the algorithm is, that we can avoid the exhaustive search if we partition our dataset in such a way that on search, we only query relevant partitions (also called Voronoi cells).

What is approximate nearest neighbor in machine learning?

Approximate Nearest Neighbor techniques speed up the search by preprocessing the data into an efficient index and are often tackled using these phases: Vector Transformation — applied on vectors before they are indexed, amongst them, there is dimensionality reduction and vector rotation.


1 Answers

Using igraph, get 2nd degree neighbours, drop numeric nodes, paste remaining nodes.

library(data.table)
library(igraph)

# reshape wide-to-long
x <- melt(groups, id.vars = "group")[!is.na(value)]

# convert to graph
g <- graph_from_data_frame(x[, .(from = group, to = paste0(variable, "_", value))])

# get 2nd degree neighbours
x1 <- ego(g, 2, nodes = groups$group)

# prettify the result
groups$res <- sapply(seq_along(x1), function(i) toString(intersect(names(x1[[ i ]]),
                                                                   groups$group[ -i ])))

#    group code_1 code_2 code_3        res
# 1:     A      2     NA      4 B, C, D, E
# 2:     B      2      3      1 A, C, D, F
# 3:     C      2     NA      1    A, B, F
# 4:     D      7      3      4    B, A, E
# 5:     E      8     NA      4       A, D
# 6:     F     NA     NA      1       B, C
# 7:     G      5      2      8           

More info

This is how our data looks like before converting to igraph object. We want to ensure code1 with value 2 is different from code2 with value 2, etc.

x[, .(from = group, to = paste0(variable, "_", value))]
#     from       to
#  1:    A code_1_2
#  2:    B code_1_2
#  3:    C code_1_2
#  4:    D code_1_7
#  5:    E code_1_8
#  6:    G code_1_5
#  7:    B code_2_3
#  8:    D code_2_3
#  9:    G code_2_2
# 10:    A code_3_4
# 11:    B code_3_1
# 12:    C code_3_1
# 13:    D code_3_4
# 14:    E code_3_4
# 15:    F code_3_1
# 16:    G code_3_8

Here is how our network looks like: enter image description here

Note that A..G nodes are always connected through code_x_y. So we need to get the 2nd degree, ego(..., order = 2) gives us neighbours up to including 2nd degree neighbours, and returns a list object.

To get the names:

lapply(x1, names)
# [[1]]
# [1] "A"        "code_1_2" "code_3_4" "B"        "C"        "D"        "E"       
# 
# [[2]]
# [1] "B"        "code_1_2" "code_2_3" "code_3_1" "A"        "C"        "D"        "F"       
# 
# [[3]]
# [1] "C"        "code_1_2" "code_3_1" "A"        "B"        "F"       
# 
# [[4]]
# [1] "D"        "code_1_7" "code_2_3" "code_3_4" "B"        "A"        "E"       
# 
# [[5]]
# [1] "E"        "code_1_8" "code_3_4" "A"        "D"       
# 
# [[6]]
# [1] "F"        "code_3_1" "B"        "C"       
# 
# [[7]]
# [1] "G"        "code_1_5" "code_2_2" "code_3_8"

To prettify the result, we need to remove code_x_y nodes and the origin node (1st node)

sapply(seq_along(x1), function(i) toString(intersect(names(x1[[ i ]]), groups$group[ -i ])))
#[1] "B, C, D, E" "A, C, D, F" "A, B, F"    "B, A, E"    "A, D"       "B, C"       ""   
like image 177
zx8754 Avatar answered Sep 18 '22 15:09

zx8754