Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding ties in a network based on node attribute (weight)

Tags:

I am simulating network change over time using igraph in r and am looking for an efficient and scalable way to code this for use in business.

The main drivers of network change are:

  • New nodes
  • New ties
  • New node weights

In the first stage, in the network of 100 nodes 10% are randomly connected. The node weights are also assigned at random. The network is undirected. There are 100 stages.

In each of the following stages:

  • Ten (10) new nodes occur randomly and are added to the model. They are unconnected in this stage.
  • The node weights of these new nodes are assigned at random.
  • The new ties between two nodes in time t+1 are a probabilistic function of the network distance between these nodes in the network and the node weight at previous stage (time t). Nodes at greater network distance are less likely to connect than nodes nodes at shorter distance. The decay function is exponential.
  • Nodes with greater weight attract more ties than those with smaller weights. The relationship between node weight and increased probability of tie-formation should be super-linear.
  • In each step, 10% of the total existing ties is added as a function what the previous point.
  • The network ties and nodes from previous stages are carried over (i.e. the networks are cumulative).
  • At each stage, the node weight can change randomly up to 10% of its current weight (i.e. a weight of 1 can change to {0.9-1.1} in t+1)
  • At each stage, the network needs to be saved.

How can this be written?

Edit: these networks will be examined on a number of graph-level characteristics at a later stage


This is what I have now, but doesn't include the node weights. How do we include this efficiently?

# number of nodes and ties to start with
n = 100
p = 0.1
r = -2


# build random network
net1 <- erdos.renyi.game(n, p, "gnp", directed = F)
#plot(net1)
write_graph(net1, paste0("D://network_sim_0.dl"), format="pajek")


for(i in seq(1,100,1)){

print(i) 
time <- proc.time()

net1 <- read_graph(paste0("D://network_sim_",i-1,".dl"), format="pajek")  

# how many will we build in next stage?
new_ties <- round(0.1*ecount(net1), 0)  # 10% of those in net1

# add 10 new nodes
net2 <- add_vertices(net1, 10)

# get network distance for each dyad in net1 + the new nodes
spel <- data.table::melt(shortest.paths(net2))
names(spel) <- c("node_i", "node_j", "distance")

# replace inf with max observed value + 1
spel$distance[which(!is.finite(spel$distance))] <- max(spel$distance[is.finite(spel$distance)]) +1

# assign a probability (?) with a exponential decay function. Smallest distance == greatest prob.
spel$prob <- -0.5 * spel$distance^r   # is this what I need?
#hist(spel$prob, freq=T, xlab="Probability of tie-formation")
#hist(spel$distance, freq=T, xlab="Network Distance")

# lets sample new ties from this probability
spel$index <- seq_along(spel$prob)
to_build <- subset(spel, index %in% sample(spel$index, size = new_ties, prob=spel$prob))
net2 <- add_edges(net2, as.numeric(unlist(str_split(paste(to_build$node_i, to_build$node_j), " "))))

# save the network
write_graph(net2, paste0("D://network_sim_",i,".dl"), format="pajek")

print(proc.time()-time)
}

    
like image 253
wake_wake Avatar asked Jul 15 '20 13:07

wake_wake


1 Answers

I will try to answer this question, as far as I understand.

There are a couple of assumptions I made; I should clarify them.

First, what distribution will node weights follow?

If you are modeling an event that naturally occurs, it is most likely that the node weights follow a normal distribution. However, if the event is socially-oriented and other social mechanisms influence the event or the event popularity, the node weights might follow a different distribution-- mostly likely a power distribution.

Mainly, this is likely to true for customer-related behaviors. So, it would be beneficial for you to consider the random distribution you will model for the node weights.

For the following example, I use normal distributions to define value from a normal distribution for each node. At the end of each iteration, I let the node weights change up to %10 {.9,1.10}.

Second, what is the probability function of tie formation?

We have two inputs for making a decision: distance weights and node weights. So, we will create a function by using these two inputs and define probability weights. What I understood is that the smaller the distance is, the higher the likelihood is. And then the greater the node weight is, the higher the likelihood is, as well.

It might not be the best solution, but I did the followings:

First, calculate the decay function of distances and call it distance weights. Then, I get the node weights and create a super-linear function using both distance and node weights.

So, there are some parameters you can play with and see whether you get a result you want.

Btw, I did not change most of your codes. Also, I did not focus on processing time a lot. There are still rooms to impove.

library(scales)
library(stringr)
library(igraph)

# number of nodes and ties to start with
n <- 100
p <- 0.2
number_of_simulation <- 100

new_nodes <- 15 ## new nodes for each iteration


## Parameters ##

## How much distance will be weighted? 
## Exponential decay parameter
beta_distance_weight <- -.4

## probability function parameters for the distance and node weights 

impact_of_distances <- 0.3 ## how important is the distance weights?
impact_of_nodes <- 0.7     ## how important is the node weights?
power_base  <- 5.5         ## how important is having a high score? Prefential attachment or super-linear function

# build random network
net1 <- erdos.renyi.game(n, p, "gnp", directed = F)

# Assign normally distributed random weights
V(net1)$weight <- rnorm(vcount(net1))

graph_list <- list(net1)

for(i in seq(1,number_of_simulation,1)){

print(i) 
time <- proc.time()

net1 <- graph_list[[i]]

# how many will we build in next stage?
new_ties <- round(0.1*ecount(net1), 0)  # 10% of those in net1
# add 10 new nodes
net2 <- add_vertices(net1, new_nodes)

## Add random weights to new nodes from a normal distribution
V(net2)$weight[is.na(V(net2)$weight)] <- rnorm(new_nodes)

# get network distance for each dyad in net1 + the new nodes
spel <- reshape2::melt(shortest.paths(net2))
names(spel) <- c("node_i", "node_j", "distance")

# replace inf with max observed value + 1
spel$distance[which(!is.finite(spel$distance))] <- max(spel$distance[is.finite(spel$distance)]) +1

# Do not select nodes if they are self-looped or have already link
spel <- spel[!spel$distance %in% c(0,1) , ]

# Assign distance weights for each dyads
spel$distance_weight <- exp(beta_distance_weight*spel$distance)  

#hist(spel$distance_weight, freq=T, xlab="Probability of tie-formation")
#hist(spel$distance, freq=T, xlab="Network Distance")

## Get the node weights for merging the data with the distances 
node_weights <- data.frame(id= 1:vcount(net2),node_weight=V(net2)$weight)
spel <- merge(spel,node_weights,by.x='node_j',by.y='id')

## probability is the function of distince and node weight
spel$prob <- power_base^((impact_of_distances * spel$distance_weight) + (impact_of_nodes * spel$node_weight))
spel <- spel[order(spel$prob, decreasing = T),]

# lets sample new ties from this probability with a beta distribution 
spel$index <- seq_along(spel$prob)
to_build <- subset(spel, index %in% sample(spel$index, new_ties, p = 1/spel$index ))
net2 <- add_edges(net2, as.numeric(unlist(str_split(paste(to_build$node_i, to_build$node_j), " "))))

# change in the weights up to %10 
V(net2)$weight <- V(net2)$weight*rescale(rnorm(vcount(net2)), to = c(0.9, 1.1))

graph_list[[i+1]] <- net2

print(proc.time()-time)
}

To get the results or write the graph to Pajek, you can use the following:

lapply(seq_along(graph_list),function(x) write_graph(graph_list[[x]], paste0("network_sim_",x,".dl"), format="pajek"))

EDIT

To change the node weight, you can use the following syntax.

library(scales)
library(stringr)
library(igraph)

# number of nodes and ties to start with
n <- 100
p <- 0.2
number_of_simulation <- 100

new_nodes <- 10 ## new nodes for each iteration


## Parameters ##

## How much distance will be weighted? 
## Exponential decay parameter
beta_distance_weight <- -.4

## Node weights for power-law dist 
power_law_parameter <- -.08
## probability function parameters for the distance and node weights 

impact_of_distances <- 0.3 ## how important is the distance weights?
impact_of_nodes <- 0.7     ## how important is the node weights?
power_base  <- 5.5         ## how important is having a high score? Prefential attachment or super-linear function

# build random network
net1 <- erdos.renyi.game(n, p, "gnp", directed = F)

## MADE A CHANGE HERE 
# Assign normally distributed random weights
V(net1)$weight <- runif(vcount(net1))^power_law_parameter

graph_list <- list(net1)

for(i in seq(1,number_of_simulation,1)){

print(i) 
time <- proc.time()

net1 <- graph_list[[i]]

# how many will we build in next stage?
new_ties <- round(0.1*ecount(net1), 0)  # 10% of those in net1
# add 10 new nodes
net2 <- add_vertices(net1, new_nodes)

## Add random weights to new nodes from a normal distribution
V(net2)$weight[is.na(V(net2)$weight)] <- runif(new_nodes)^power_law_parameter

# get network distance for each dyad in net1 + the new nodes
spel <- reshape2::melt(shortest.paths(net2))
names(spel) <- c("node_i", "node_j", "distance")

# replace inf with max observed value + 1
spel$distance[which(!is.finite(spel$distance))] <- max(spel$distance[is.finite(spel$distance)]) + 2

# Do not select nodes if they are self-looped or have already link
spel <- spel[!spel$distance %in% c(0,1) , ]

# Assign distance weights for each dyads
spel$distance_weight <- exp(beta_distance_weight*spel$distance)  

#hist(spel$distance_weight, freq=T, xlab="Probability of tie-formation")
#hist(spel$distance, freq=T, xlab="Network Distance")

## Get the node weights for merging the data with the distances 
node_weights <- data.frame(id= 1:vcount(net2),node_weight=V(net2)$weight)
spel <- merge(spel,node_weights,by.x='node_j',by.y='id')

## probability is the function of distince and node weight
spel$prob <- power_base^((impact_of_distances * spel$distance_weight) + (impact_of_nodes * spel$node_weight))
spel <- spel[order(spel$prob, decreasing = T),]

# lets sample new ties from this probability with a beta distribution 
spel$index <- seq_along(spel$prob)
to_build <- subset(spel, index %in% sample(spel$index, new_ties, p = 1/spel$index ))
net2 <- add_edges(net2, as.numeric(unlist(str_split(paste(to_build$node_i, to_build$node_j), " "))))

# change in the weights up to %10 
V(net2)$weight <- V(net2)$weight*rescale(rnorm(vcount(net2)), to = c(0.9, 1.1))

graph_list[[i+1]] <- net2

print(proc.time()-time)
}

Result

So, to validate whether the code is working, I checked a small number of iteration with limited nodes: 10 iterations with 4 nodes. For each iteration, I added 3 new nodes and one new tie.

I did this simulation with three different settings.

The first setting focuses on only the weight function of distances: the more close nodes are, the more likely that a new tie will be formed between them.

The second setting focuses on only the weight function of node: the more weight nodes have, the more likely that a new tie will be formed with them.

The third setting focuses on the weight functions of both distance and node: the more weight nodes have and the more they are close, the more likely that a new tie will be formed with them.

Please observe the network behaviors how each setting provided different results.

  1. Only Distance Matters

enter image description here

  1. Only Node Weight Matters enter image description here

  2. Both Node Weight and Distance Matter

enter image description here

like image 195
mustafaakben Avatar answered Sep 30 '22 01:09

mustafaakben