I have a data.table that looks like this:
# Load packages
library(data.table)
# Set RNG seed
set.seed(-1)
# Create dummy data
dt <- data.table(foo = sample(letters[1:10], 6),
bar = sample(letters[1:10], 6))
dt
#> foo bar
#> 1: g a
#> 2: h j
#> 3: j e
#> 4: a i
#> 5: d g
#> 6: i c
I would like to group together all associated elements. What I mean by that is, for example, a
and g
are together in the first row, so they belong together in a group (a
, g
). But a
and i
are together on row 4, so i
also belongs to this group (a
, g
, i
). Also, i
is associated with c
on row 6, so c
also belongs to the group (a
, g
, i
, c
). On row 5, d
and g
are together, so d
also belongs to this group (a
, g
, i
, c
, d
).
Applying this logic gives the following desired result.
# Desired result
# [[1]]
# [1] a c d g i
# [[2]]
# [1] e h j
I have some code that achieves this result, but nesting a mapply
in a while
loop together with some really clunky handling of data structures makes me think that this is far from optimal.
# Loop counter
i <- 1
# List of groups
res <- list()
while(nrow(dt)>0){
# Add first row to list
res[[i]] <- unlist(dt[1])
# Check each row in dt
mapply(function(x, y){
# If there are common elements between current row and current group
if(length(intersect(c(x, y), res[[i]])) > 0){
# Add elements from this row to this group
res[[i]] <<- c(res[[i]], x, y)
}
}, dt$foo, dt$bar)
# Only keep unique elements
res[[i]] <- unique(res[[i]])
# Remove rows that have elements in the current group
dt <- dt[!(foo %in% res[[i]] | bar %in% res[[i]])]
# Increment loop counter
i <- i + 1
}
gives,
res
#> [[1]]
#> [1] "g" "a" "i" "d" "c"
#>
#> [[2]]
#> [1] "h" "j" "e"
as required.
Is there a more elegant and efficient way of achieving this result?
How to Convert Multiple Rows to Single Row using the Ampersand Sign. With the Ampersand sign “&” you can easily combine multiple rows into a single cell. Following this trick, you can join multiple texts with space as a separator. Here, in this case, B4, B5, and B6 are for the texts.
Your data could be considered as a graph with components of different connectivity. To analyze this kind of data you could use the library igraph
:
Simply create a graph from your data frame of edges:
library(data.table)
library(igraph)
set.seed(-1)
foo = sample(letters[1:10], 6)
bar = sample(letters[1:10], 6)
edges <- data.table(foo, bar)
net <- igraph::graph_from_data_frame(d = edges, directed = F)
You can then find the isolated components of the graph:
components(net)
# $membership
# g h j a d i e c
# 1 2 2 1 1 1 2 1
#
# $csize
# [1] 5 3
#
# $no
# [1] 2
Or get a nicer list of the vertices contained in each component:
split(names(V(net)), components(net)$membership)
# $`1`
# [1] "g" "a" "d" "i" "c"
#
# $`2`
# [1] "h" "j" "e"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With