I have a data.table:
groups <- data.table(group = c("A", "B", "C", "D", "E", "F", "G"),
code_1 = c(2,2,2,7,8,NA,5),
code_2 = c(NA,3,NA,3,NA,NA,2),
code_3 = c(4,1,1,4,4,1,8))
group code_1 code_2 code_3
A 2 NA 4
B 2 3 1
C 2 NA 1
D 7 3 4
E 8 NA 4
F NA NA 1
G 5 2 8
What I would like to achieve, is for each group to find the immediate neighbors based on the available codes. For example: Group A has immediate neighbors groups B, C due to code_1 (code_1 is equal to 2 in all groups) and has immediate neighbor groups D,E due to code_3 (code_3 is equal to 4 in all those groups).
What I tried is for each code, subsetting the first column (group) based on the matches as follows:
groups$code_1_match = list()
for (row in 1:nrow(groups)){
set(groups, i=row, j="code_1_match", list(groups$group[groups$code_1[row] == groups$code_1]))
}
group code_1 code_2 code_3 code_1_match
A 2 NA 4 A,B,C,NA
B 2 3 1 A,B,C,NA
C 2 NA 1 A,B,C,NA
D 7 3 4 D,NA
E 8 NA 4 E,NA
F NA NA 1 NA,NA,NA,NA,NA,NA,...
G 5 2 8 NA,G
This "kinda" works but I would assume there is a more data table kind of way of doing this. I tried
groups[, code_1_match_2 := list(group[code_1 == groups$code_1])]
But this doesn't work.
Am I missing some obvious data table trick to deal with it?
My ideal case result would look like this (which currently would require using my method for all 3 columns and then concatenating the results):
group code_1 code_2 code_3 Immediate neighbors
A 2 NA 4 B,C,D,E
B 2 3 1 A,C,D,F
C 2 NA 1 A,B,F
D 7 3 4 B,A
E 8 NA 4 A,D
F NA NA 1 B,C
G 5 2 8
To get the index of the neighbors do something like: (remember there is a mode argument) To do what Gabor suggests you need to do the following. (I think he missed V ( ) and wrote graph$name when he meant V (graph)$name in his post) This gives the labels of the corresponding "neighbors".
The exact nearest neighbor might be across the boundary to one of the neighboring cells. The intuition of the algorithm is, that we can avoid the exhaustive search if we partition our dataset in such a way that on search, we only query relevant partitions (also called Voronoi cells).
Approximate Nearest Neighbor techniques speed up the search by preprocessing the data into an efficient index and are often tackled using these phases: Vector Transformation — applied on vectors before they are indexed, amongst them, there is dimensionality reduction and vector rotation.
Using igraph, get 2nd degree neighbours, drop numeric nodes, paste remaining nodes.
library(data.table)
library(igraph)
# reshape wide-to-long
x <- melt(groups, id.vars = "group")[!is.na(value)]
# convert to graph
g <- graph_from_data_frame(x[, .(from = group, to = paste0(variable, "_", value))])
# get 2nd degree neighbours
x1 <- ego(g, 2, nodes = groups$group)
# prettify the result
groups$res <- sapply(seq_along(x1), function(i) toString(intersect(names(x1[[ i ]]),
groups$group[ -i ])))
# group code_1 code_2 code_3 res
# 1: A 2 NA 4 B, C, D, E
# 2: B 2 3 1 A, C, D, F
# 3: C 2 NA 1 A, B, F
# 4: D 7 3 4 B, A, E
# 5: E 8 NA 4 A, D
# 6: F NA NA 1 B, C
# 7: G 5 2 8
More info
This is how our data looks like before converting to igraph object. We want to ensure code1 with value 2 is different from code2 with value 2, etc.
x[, .(from = group, to = paste0(variable, "_", value))]
# from to
# 1: A code_1_2
# 2: B code_1_2
# 3: C code_1_2
# 4: D code_1_7
# 5: E code_1_8
# 6: G code_1_5
# 7: B code_2_3
# 8: D code_2_3
# 9: G code_2_2
# 10: A code_3_4
# 11: B code_3_1
# 12: C code_3_1
# 13: D code_3_4
# 14: E code_3_4
# 15: F code_3_1
# 16: G code_3_8
Here is how our network looks like:
Note that A..G
nodes are always connected through code_x_y
.
So we need to get the 2nd degree, ego(..., order = 2)
gives us neighbours up to including 2nd degree neighbours, and returns a list object.
To get the names:
lapply(x1, names)
# [[1]]
# [1] "A" "code_1_2" "code_3_4" "B" "C" "D" "E"
#
# [[2]]
# [1] "B" "code_1_2" "code_2_3" "code_3_1" "A" "C" "D" "F"
#
# [[3]]
# [1] "C" "code_1_2" "code_3_1" "A" "B" "F"
#
# [[4]]
# [1] "D" "code_1_7" "code_2_3" "code_3_4" "B" "A" "E"
#
# [[5]]
# [1] "E" "code_1_8" "code_3_4" "A" "D"
#
# [[6]]
# [1] "F" "code_3_1" "B" "C"
#
# [[7]]
# [1] "G" "code_1_5" "code_2_2" "code_3_8"
To prettify the result, we need to remove code_x_y
nodes and the origin node (1st node)
sapply(seq_along(x1), function(i) toString(intersect(names(x1[[ i ]]), groups$group[ -i ])))
#[1] "B, C, D, E" "A, C, D, F" "A, B, F" "B, A, E" "A, D" "B, C" ""
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With