I have a data.table that looks like this: <pre class="prettyprint lang-r prettyprint-override"><code># Load packages library(data.table) # Set RNG seed set.seed(-1) # Create dummy data dt <- data.table(foo = sample(letters[1:10], 6), bar = sample(letters[1:10], 6)) dt #> foo bar #> 1: g a #> 2: h j #> 3: j e #> 4: a i #> 5: d g #> 6: i c </code></pre> I would like to group together all associated elements. What I mean by that is, for example, <code>a</code> and <code>g</code> are together in the first row, so they belong together in a group (<code>a</code>, <code>g</code>). But <code>a</code> and <code>i</code> are together on row 4, so <code>i</code> also belongs to this group (<code>a</code>, <code>g</code>, <code>i</code>). Also, <code>i</code> is associated with <code>c</code> on row 6, so <code>c</code> also belongs to the group (<code>a</code>, <code>g</code>, <code>i</code>, <code>c</code>). On row 5, <code>d</code> and <code>g</code> are together, so <code>d</code> also belongs to this group (<code>a</code>, <code>g</code>, <code>i</code>, <code>c</code>, <code>d</code>). Applying this logic gives the following desired result. <pre class="prettyprint lang-r prettyprint-override"><code># Desired result # [[1]] # [1] a c d g i # [[2]] # [1] e h j </code></pre> I have some code that achieves this result, but nesting a <code>mapply</code> in a <code>while</code> loop together with some really clunky handling of data structures makes me think that this is far from optimal. <pre class="prettyprint lang-r prettyprint-override"><code># Loop counter i <- 1 # List of groups res <- list() while(nrow(dt)>0){ # Add first row to list res[[i]] <- unlist(dt[1]) # Check each row in dt mapply(function(x, y){ # If there are common elements between current row and current group if(length(intersect(c(x, y), res[[i]])) > 0){ # Add elements from this row to this group res[[i]] <<- c(res[[i]], x, y) } }, dt$foo, dt$bar) # Only keep unique elements res[[i]] <- unique(res[[i]]) # Remove rows that have elements in the current group dt <- dt[!(foo %in% res[[i]] | bar %in% res[[i]])] # Increment loop counter i <- i + 1 } </code></pre> gives, <pre class="prettyprint lang-r prettyprint-override"><code>res #> [[1]] #> [1] "g" "a" "i" "d" "c" #> #> [[2]] #> [1] "h" "j" "e" </code></pre> as required. Is there a more elegant and efficient way of achieving this result?

Your data could be considered as a graph with components of different connectivity. To analyze this kind of data you could use the library <code>igraph</code>: Simply create a graph from your data frame of edges: <pre class="prettyprint"><code>library(data.table) library(igraph) set.seed(-1) foo = sample(letters[1:10], 6) bar = sample(letters[1:10], 6) edges <- data.table(foo, bar) net <- igraph::graph_from_data_frame(d = edges, directed = F) </code></pre> You can then find the isolated components of the graph: <pre class="prettyprint"><code>components(net) # $membership # g h j a d i e c # 1 2 2 1 1 1 2 1 # # $csize # [1] 5 3 # # $no # [1] 2 </code></pre> Or get a nicer list of the vertices contained in each component: <pre class="prettyprint"><code>split(names(V(net)), components(net)$membership) # $`1` # [1] "g" "a" "d" "i" "c" # # $`2` # [1] "h" "j" "e" </code></pre>

Combine rows that have common elements

I have a data.table that looks like this:

# Load packages
library(data.table)

# Set RNG seed
set.seed(-1)

# Create dummy data
dt <- data.table(foo = sample(letters[1:10], 6),
                 bar = sample(letters[1:10], 6))

dt
#>    foo bar
#> 1:   g   a
#> 2:   h   j
#> 3:   j   e
#> 4:   a   i
#> 5:   d   g
#> 6:   i   c

I would like to group together all associated elements. What I mean by that is, for example, a and g are together in the first row, so they belong together in a group (a, g). But a and i are together on row 4, so i also belongs to this group (a, g, i). Also, i is associated with c on row 6, so c also belongs to the group (a, g, i, c). On row 5, d and g are together, so d also belongs to this group (a, g, i, c, d).

Applying this logic gives the following desired result.

# Desired result
# [[1]]
# [1] a c d g i
# [[2]]
# [1] e h j

I have some code that achieves this result, but nesting a mapply in a while loop together with some really clunky handling of data structures makes me think that this is far from optimal.

# Loop counter
i <- 1

# List of groups
res <- list()

while(nrow(dt)>0){
  # Add first row to list
  res[[i]] <- unlist(dt[1])

  # Check each row in dt
  mapply(function(x, y){

    # If there are common elements between current row and current group
    if(length(intersect(c(x, y), res[[i]])) > 0){
      # Add elements from this row to this group
      res[[i]] <<- c(res[[i]], x, y)
    }

  }, dt$foo, dt$bar)

  # Only keep unique elements
  res[[i]] <- unique(res[[i]])

  # Remove rows that have elements in the current group
  dt <- dt[!(foo %in% res[[i]] | bar %in% res[[i]])]

  # Increment loop counter
  i <- i + 1
}

gives,

res
#> [[1]]
#> [1] "g" "a" "i" "d" "c"
#> 
#> [[2]]
#> [1] "h" "j" "e"

as required.

Is there a more elegant and efficient way of achieving this result?

How do I combine multiple rows of data into one row?

How to Convert Multiple Rows to Single Row using the Ampersand Sign. With the Ampersand sign “&” you can easily combine multiple rows into a single cell. Following this trick, you can join multiple texts with space as a separator. Here, in this case, B4, B5, and B6 are for the texts.

Your data could be considered as a graph with components of different connectivity. To analyze this kind of data you could use the library igraph:

Simply create a graph from your data frame of edges:

library(data.table)
library(igraph)

set.seed(-1)

foo = sample(letters[1:10], 6)
bar = sample(letters[1:10], 6)

edges <- data.table(foo, bar)

net <- igraph::graph_from_data_frame(d = edges, directed = F)

You can then find the isolated components of the graph:

components(net)

# $membership
# g h j a d i e c 
# 1 2 2 1 1 1 2 1 
#
# $csize
# [1] 5 3
#
# $no
# [1] 2

Or get a nicer list of the vertices contained in each component:

split(names(V(net)), components(net)$membership)
# $`1`
# [1] "g" "a" "d" "i" "c"
# 
# $`2`
# [1] "h" "j" "e"

Combine rows that have common elements

Tags:

r

Lyngbakr

People also ask

1 Answers

L_W

Recent Activity

Donate For Us

Combine rows that have common elements

Tags:

r

Lyngbakr

People also ask

1 Answers

L_W

Related questions

Recent Activity

Donate For Us