I am simulating network change over time using igraph
in r
and am looking for an efficient and scalable way to code this for use in business.
The main drivers of network change are:
In the first stage, in the network of 100 nodes 10% are randomly connected. The node weights are also assigned at random. The network is undirected. There are 100 stages.
In each of the following stages:
How can this be written?
Edit: these networks will be examined on a number of graph-level characteristics at a later stage
This is what I have now, but doesn't include the node weights. How do we include this efficiently?
# number of nodes and ties to start with
n = 100
p = 0.1
r = -2
# build random network
net1 <- erdos.renyi.game(n, p, "gnp", directed = F)
#plot(net1)
write_graph(net1, paste0("D://network_sim_0.dl"), format="pajek")
for(i in seq(1,100,1)){
print(i)
time <- proc.time()
net1 <- read_graph(paste0("D://network_sim_",i-1,".dl"), format="pajek")
# how many will we build in next stage?
new_ties <- round(0.1*ecount(net1), 0) # 10% of those in net1
# add 10 new nodes
net2 <- add_vertices(net1, 10)
# get network distance for each dyad in net1 + the new nodes
spel <- data.table::melt(shortest.paths(net2))
names(spel) <- c("node_i", "node_j", "distance")
# replace inf with max observed value + 1
spel$distance[which(!is.finite(spel$distance))] <- max(spel$distance[is.finite(spel$distance)]) +1
# assign a probability (?) with a exponential decay function. Smallest distance == greatest prob.
spel$prob <- -0.5 * spel$distance^r # is this what I need?
#hist(spel$prob, freq=T, xlab="Probability of tie-formation")
#hist(spel$distance, freq=T, xlab="Network Distance")
# lets sample new ties from this probability
spel$index <- seq_along(spel$prob)
to_build <- subset(spel, index %in% sample(spel$index, size = new_ties, prob=spel$prob))
net2 <- add_edges(net2, as.numeric(unlist(str_split(paste(to_build$node_i, to_build$node_j), " "))))
# save the network
write_graph(net2, paste0("D://network_sim_",i,".dl"), format="pajek")
print(proc.time()-time)
}
I will try to answer this question, as far as I understand.
There are a couple of assumptions I made; I should clarify them.
First, what distribution will node weights follow?
If you are modeling an event that naturally occurs, it is most likely that the node weights follow a normal distribution. However, if the event is socially-oriented and other social mechanisms influence the event or the event popularity, the node weights might follow a different distribution-- mostly likely a power distribution.
Mainly, this is likely to true for customer-related behaviors. So, it would be beneficial for you to consider the random distribution you will model for the node weights.
For the following example, I use normal distributions to define value from a normal distribution for each node. At the end of each iteration, I let the node weights change up to %10 {.9,1.10}.
Second, what is the probability function of tie formation?
We have two inputs for making a decision: distance weights and node weights. So, we will create a function by using these two inputs and define probability weights. What I understood is that the smaller the distance is, the higher the likelihood is. And then the greater the node weight is, the higher the likelihood is, as well.
It might not be the best solution, but I did the followings:
First, calculate the decay function of distances and call it distance weights. Then, I get the node weights and create a super-linear function using both distance and node weights.
So, there are some parameters you can play with and see whether you get a result you want.
Btw, I did not change most of your codes. Also, I did not focus on processing time a lot. There are still rooms to impove.
library(scales)
library(stringr)
library(igraph)
# number of nodes and ties to start with
n <- 100
p <- 0.2
number_of_simulation <- 100
new_nodes <- 15 ## new nodes for each iteration
## Parameters ##
## How much distance will be weighted?
## Exponential decay parameter
beta_distance_weight <- -.4
## probability function parameters for the distance and node weights
impact_of_distances <- 0.3 ## how important is the distance weights?
impact_of_nodes <- 0.7 ## how important is the node weights?
power_base <- 5.5 ## how important is having a high score? Prefential attachment or super-linear function
# build random network
net1 <- erdos.renyi.game(n, p, "gnp", directed = F)
# Assign normally distributed random weights
V(net1)$weight <- rnorm(vcount(net1))
graph_list <- list(net1)
for(i in seq(1,number_of_simulation,1)){
print(i)
time <- proc.time()
net1 <- graph_list[[i]]
# how many will we build in next stage?
new_ties <- round(0.1*ecount(net1), 0) # 10% of those in net1
# add 10 new nodes
net2 <- add_vertices(net1, new_nodes)
## Add random weights to new nodes from a normal distribution
V(net2)$weight[is.na(V(net2)$weight)] <- rnorm(new_nodes)
# get network distance for each dyad in net1 + the new nodes
spel <- reshape2::melt(shortest.paths(net2))
names(spel) <- c("node_i", "node_j", "distance")
# replace inf with max observed value + 1
spel$distance[which(!is.finite(spel$distance))] <- max(spel$distance[is.finite(spel$distance)]) +1
# Do not select nodes if they are self-looped or have already link
spel <- spel[!spel$distance %in% c(0,1) , ]
# Assign distance weights for each dyads
spel$distance_weight <- exp(beta_distance_weight*spel$distance)
#hist(spel$distance_weight, freq=T, xlab="Probability of tie-formation")
#hist(spel$distance, freq=T, xlab="Network Distance")
## Get the node weights for merging the data with the distances
node_weights <- data.frame(id= 1:vcount(net2),node_weight=V(net2)$weight)
spel <- merge(spel,node_weights,by.x='node_j',by.y='id')
## probability is the function of distince and node weight
spel$prob <- power_base^((impact_of_distances * spel$distance_weight) + (impact_of_nodes * spel$node_weight))
spel <- spel[order(spel$prob, decreasing = T),]
# lets sample new ties from this probability with a beta distribution
spel$index <- seq_along(spel$prob)
to_build <- subset(spel, index %in% sample(spel$index, new_ties, p = 1/spel$index ))
net2 <- add_edges(net2, as.numeric(unlist(str_split(paste(to_build$node_i, to_build$node_j), " "))))
# change in the weights up to %10
V(net2)$weight <- V(net2)$weight*rescale(rnorm(vcount(net2)), to = c(0.9, 1.1))
graph_list[[i+1]] <- net2
print(proc.time()-time)
}
To get the results or write the graph to Pajek, you can use the following:
lapply(seq_along(graph_list),function(x) write_graph(graph_list[[x]], paste0("network_sim_",x,".dl"), format="pajek"))
To change the node weight, you can use the following syntax.
library(scales)
library(stringr)
library(igraph)
# number of nodes and ties to start with
n <- 100
p <- 0.2
number_of_simulation <- 100
new_nodes <- 10 ## new nodes for each iteration
## Parameters ##
## How much distance will be weighted?
## Exponential decay parameter
beta_distance_weight <- -.4
## Node weights for power-law dist
power_law_parameter <- -.08
## probability function parameters for the distance and node weights
impact_of_distances <- 0.3 ## how important is the distance weights?
impact_of_nodes <- 0.7 ## how important is the node weights?
power_base <- 5.5 ## how important is having a high score? Prefential attachment or super-linear function
# build random network
net1 <- erdos.renyi.game(n, p, "gnp", directed = F)
## MADE A CHANGE HERE
# Assign normally distributed random weights
V(net1)$weight <- runif(vcount(net1))^power_law_parameter
graph_list <- list(net1)
for(i in seq(1,number_of_simulation,1)){
print(i)
time <- proc.time()
net1 <- graph_list[[i]]
# how many will we build in next stage?
new_ties <- round(0.1*ecount(net1), 0) # 10% of those in net1
# add 10 new nodes
net2 <- add_vertices(net1, new_nodes)
## Add random weights to new nodes from a normal distribution
V(net2)$weight[is.na(V(net2)$weight)] <- runif(new_nodes)^power_law_parameter
# get network distance for each dyad in net1 + the new nodes
spel <- reshape2::melt(shortest.paths(net2))
names(spel) <- c("node_i", "node_j", "distance")
# replace inf with max observed value + 1
spel$distance[which(!is.finite(spel$distance))] <- max(spel$distance[is.finite(spel$distance)]) + 2
# Do not select nodes if they are self-looped or have already link
spel <- spel[!spel$distance %in% c(0,1) , ]
# Assign distance weights for each dyads
spel$distance_weight <- exp(beta_distance_weight*spel$distance)
#hist(spel$distance_weight, freq=T, xlab="Probability of tie-formation")
#hist(spel$distance, freq=T, xlab="Network Distance")
## Get the node weights for merging the data with the distances
node_weights <- data.frame(id= 1:vcount(net2),node_weight=V(net2)$weight)
spel <- merge(spel,node_weights,by.x='node_j',by.y='id')
## probability is the function of distince and node weight
spel$prob <- power_base^((impact_of_distances * spel$distance_weight) + (impact_of_nodes * spel$node_weight))
spel <- spel[order(spel$prob, decreasing = T),]
# lets sample new ties from this probability with a beta distribution
spel$index <- seq_along(spel$prob)
to_build <- subset(spel, index %in% sample(spel$index, new_ties, p = 1/spel$index ))
net2 <- add_edges(net2, as.numeric(unlist(str_split(paste(to_build$node_i, to_build$node_j), " "))))
# change in the weights up to %10
V(net2)$weight <- V(net2)$weight*rescale(rnorm(vcount(net2)), to = c(0.9, 1.1))
graph_list[[i+1]] <- net2
print(proc.time()-time)
}
So, to validate whether the code is working, I checked a small number of iteration with limited nodes: 10 iterations with 4 nodes. For each iteration, I added 3 new nodes and one new tie.
I did this simulation with three different settings.
The first setting focuses on only the weight function of distances: the more close nodes are, the more likely that a new tie will be formed between them.
The second setting focuses on only the weight function of node: the more weight nodes have, the more likely that a new tie will be formed with them.
The third setting focuses on the weight functions of both distance and node: the more weight nodes have and the more they are close, the more likely that a new tie will be formed with them.
Please observe the network behaviors how each setting provided different results.
Only Node Weight Matters
Both Node Weight and Distance Matter
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With