I want to count the number of the unique edges in an undirected network, e.g, net
x y
1 A B
2 B A
3 A B
There should be only one unique edge for this matrix, because edges A-B and B-A are same for the undirected network.
For the directed network I can get the number of unique edges by:
nrow(unique(net[,c("x","y"]))
But this doesn't work for the undirected network.
An edge list is a data frame that contains a minimum of two columns, one column of nodes that are the source of a connection and another column of nodes that are the target of the connection. The nodes in the data are identified by unique IDs.
Undirected graphs have edges that do not have a direction. The edges indicate a two-way relationship, in that each edge can be traversed in both directions. This figure shows a simple undirected graph with three nodes and three edges. Directed graphs have edges with direction.
Use the function graph_from_adjacency_matrix to convert your adjacency matrix into a graph and set the argument diag=F . That should get rid of the self-loops.
If the edges in a network are directed, i.e., pointing in only one direction, the network is called a directed network (or a directed graph, sometimes digraph for short). When drawing a directed network, the edges are typically drawn as arrows indicating the direction, as illustrated in the first figure, below.
Given that you are working with networks, an igraph
solution:
library(igraph)
as_data_frame(simplify(graph_from_data_frame(dat, directed=FALSE)))
Then use nrow
Explanantion
dat %>%
graph_from_data_frame(., directed=FALSE) %>% # convert to undirected graph
simplify %>% # remove loops / multiple edges
as_data_frame # return remaining edges
Try this,
df <- data.frame(x=c("A", "B", "A"), y = c("B", "A", "B"))
unique(apply(df, 1, function(x) paste(sort(unlist(strsplit(x, " "))),collapse = " ")))
[1] "A B"
So how does this work?
We are applying a function to each row of the data frame, so we can take each row at a time. Take the second row of the df,
df[2,]
x y
1 B A
We then split (strsplit
) this, and unlist
into a vector of each letter, (We use as.matrix
to isolate the elements)
unlist(strsplit(as.matrix(df[2,]), " "))
[1] "B" "A"
Use the sort function to put into alphabetical order, then paste them back together,
paste(sort(unlist(strsplit(as.matrix(df[2,]), " "))), collapse = " ")
[1] "A B"
Then the apply
function does this for all the rows, as we set the index to 1, then use the unique
function to identify unique edges.
Extension
This can be extended to n variables, for example n=3,
df <- data.frame(x=c("A", "B", "A"), y = c("B", "A", "B"), z = c("C", "D", "D"))
unique(apply(df, 1, function(x) paste(sort(unlist(strsplit(x, " "))),collapse = " ")))
[1] "A B C" "A B D"
If more letters are needed, just combine two letters like the following,
df <- data.frame(x=c("A", "BC", "A"), y = c("B", "A", "BC"))
df
x y
1 A B
2 BC A
3 A BC
unique(apply(df, 1, function(x) paste(sort(unlist(strsplit(x, " "))),collapse = " ")))
[1] "A B" "A BC"
Old version
Using the tidyverse
package, create a function called rev
that can order our edges, then use mutate
to create a new column combining the x and y columns, in such a way it works well with the rev
function, then run the new column through the function and find the unique pairs.
library(tidyverse)
rev <- function(x){
unname(sapply(x, function(x) {
paste(sort(trimws(strsplit(x[1], ',')[[1]])), collapse=',')} ))
}
df <- data.frame(x=c("A", "B", "A"), y = c("B", "A", "B"))
rows <- df %>%
mutate(both = c(paste(x, y, sep = ", ")))
unique(rev(rows$both))
Here is a solution without the intervention of igraph
, all inside one pipe:
df = tibble(x=c("A", "B", "A"), y = c("B", "A", "B"))
It is possible to use group_by()
and then sort()
combinations of values and paste()
them in the new column via mutate()
. unique()
is utilized if you have "true" duplicates (A-B, A-B will get into one group).
df %>%
group_by(x, y) %>%
mutate(edge_id = paste(sort(unique(c(x,y))), collapse=" "))
When you have properly sorted edge names in a new column, it's quite straightforward to count unique values or filter duplicates out of your data frame.
If you have additional variables for edges, just add them into grouping.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With