Given a random simple graph of N nodes, N edges, and a uniform edge probability p, what is the expected size of the largest connected component?
{(1,a), (2,b), (3,c), ..., (N,[next available letter of the alphabet])}
. The conditions are that a!=1, b!=2, c!=3 etc. but other than that, a,b,c,... can take any value from 1 to N inclusiveThe question is: among all such valid graphs constructed, what is the expected value of S?
I would like to know of an efficient way to think about/analyze/solve this problem that can deal with N ranging from 2 to 50 inclusive. Some code to illustrate would be helpful to my understanding.
I wrote the following source codes to naively bruteforce the possibilities, hoping to find a pattern from there. Managed to get up to N = 9
with these:
Both attempt to generate all possible permutations according to the criteria specified in the problem, and then calculate S for each graph generated via BFS.
Output so far
As for the format, eg. for the line "N=4: {2:3,4:78} 106/27
", this means that there are 3 graphs with the size of the largest component = 2, and 78 graphs with the size of the largest component = 4, so the final answer = (3/(78+3))*2 + (78/(78+3))*4 = 106/27
.
N=2: {2:1} 2/1
N=3: {3:8} 3/1
N=4: {2:3,4:78} 106/27
N=5: {3:80,5:944} 155/32
N=6: {2:15,3:640,4:1170,6:13800} 17886/3125
N=7: {3:840,4:21840,5:19824,7:237432} 38563/5832
N=8: {2:105,3:17920,4:229320,5:422912,6:386400,8:4708144} 6152766/823543
N=9: {3:153440,4:786240,5:9634464,6:9273600,7:8547552,9:105822432} 17494593/2097152
edit: Just discovered this OEIS sequence #A000435 gives the answers (formula/listing up to N=18) for the number of graphs with N nodes and largest size component = N. This is exactly coincidental with my bruteforce output, eg. for N=9, there are 105822432 graphs with the largest size of connected component = 9. Not sure if this helps - one of the formulas given there to derive this for larger N is a(n) = (n-1)! * Sum (n^k/k!, k=0..n-2)
Python implementation
Example for N = 4
:
There are a total of 81 possible edge assignments for 4 nodes (1-based indexed from 1 to N).
The format for the sample output below is: ((1, 2), (2, 1), (3, 1), (4, 1)): 4
means that there are edges between nodes 1 and 2, nodes 2 and 1, nodes 3 and 1, and nodes 4 and 1. Assume the edges are undirected/bidirectional. For such a graph, there is only a single connected component of all 4 nodes {1,2,3,4}, so the size of the largest (only) connected component is 4.
After generating this list, expected value of S can be computed via (fraction of all 81 instances where
S== 4) * 4 + (fraction of all 81 instances where
S== 2) * 2 =
106/27 - since the only values S takes are 2,4.
{((1, 2), (2, 1), (3, 1), (4, 1)): 4,
((1, 2), (2, 1), (3, 1), (4, 2)): 4,
((1, 2), (2, 1), (3, 1), (4, 3)): 4,
((1, 2), (2, 1), (3, 2), (4, 1)): 4,
((1, 2), (2, 1), (3, 2), (4, 2)): 4,
((1, 2), (2, 1), (3, 2), (4, 3)): 4,
((1, 2), (2, 1), (3, 4), (4, 1)): 4,
((1, 2), (2, 1), (3, 4), (4, 2)): 4,
((1, 2), (2, 1), (3, 4), (4, 3)): 2,
((1, 2), (2, 3), (3, 1), (4, 1)): 4,
((1, 2), (2, 3), (3, 1), (4, 2)): 4,
((1, 2), (2, 3), (3, 1), (4, 3)): 4,
((1, 2), (2, 3), (3, 2), (4, 1)): 4,
((1, 2), (2, 3), (3, 2), (4, 2)): 4,
((1, 2), (2, 3), (3, 2), (4, 3)): 4,
((1, 2), (2, 3), (3, 4), (4, 1)): 4,
((1, 2), (2, 3), (3, 4), (4, 2)): 4,
((1, 2), (2, 3), (3, 4), (4, 3)): 4,
((1, 2), (2, 4), (3, 1), (4, 1)): 4,
((1, 2), (2, 4), (3, 1), (4, 2)): 4,
((1, 2), (2, 4), (3, 1), (4, 3)): 4,
((1, 2), (2, 4), (3, 2), (4, 1)): 4,
((1, 2), (2, 4), (3, 2), (4, 2)): 4,
((1, 2), (2, 4), (3, 2), (4, 3)): 4,
((1, 2), (2, 4), (3, 4), (4, 1)): 4,
((1, 2), (2, 4), (3, 4), (4, 2)): 4,
((1, 2), (2, 4), (3, 4), (4, 3)): 4,
((1, 3), (2, 1), (3, 1), (4, 1)): 4,
((1, 3), (2, 1), (3, 1), (4, 2)): 4,
((1, 3), (2, 1), (3, 1), (4, 3)): 4,
((1, 3), (2, 1), (3, 2), (4, 1)): 4,
((1, 3), (2, 1), (3, 2), (4, 2)): 4,
((1, 3), (2, 1), (3, 2), (4, 3)): 4,
((1, 3), (2, 1), (3, 4), (4, 1)): 4,
((1, 3), (2, 1), (3, 4), (4, 2)): 4,
((1, 3), (2, 1), (3, 4), (4, 3)): 4,
((1, 3), (2, 3), (3, 1), (4, 1)): 4,
((1, 3), (2, 3), (3, 1), (4, 2)): 4,
((1, 3), (2, 3), (3, 1), (4, 3)): 4,
((1, 3), (2, 3), (3, 2), (4, 1)): 4,
((1, 3), (2, 3), (3, 2), (4, 2)): 4,
((1, 3), (2, 3), (3, 2), (4, 3)): 4,
((1, 3), (2, 3), (3, 4), (4, 1)): 4,
((1, 3), (2, 3), (3, 4), (4, 2)): 4,
((1, 3), (2, 3), (3, 4), (4, 3)): 4,
((1, 3), (2, 4), (3, 1), (4, 1)): 4,
((1, 3), (2, 4), (3, 1), (4, 2)): 2,
((1, 3), (2, 4), (3, 1), (4, 3)): 4,
((1, 3), (2, 4), (3, 2), (4, 1)): 4,
((1, 3), (2, 4), (3, 2), (4, 2)): 4,
((1, 3), (2, 4), (3, 2), (4, 3)): 4,
((1, 3), (2, 4), (3, 4), (4, 1)): 4,
((1, 3), (2, 4), (3, 4), (4, 2)): 4,
((1, 3), (2, 4), (3, 4), (4, 3)): 4,
((1, 4), (2, 1), (3, 1), (4, 1)): 4,
((1, 4), (2, 1), (3, 1), (4, 2)): 4,
((1, 4), (2, 1), (3, 1), (4, 3)): 4,
((1, 4), (2, 1), (3, 2), (4, 1)): 4,
((1, 4), (2, 1), (3, 2), (4, 2)): 4,
((1, 4), (2, 1), (3, 2), (4, 3)): 4,
((1, 4), (2, 1), (3, 4), (4, 1)): 4,
((1, 4), (2, 1), (3, 4), (4, 2)): 4,
((1, 4), (2, 1), (3, 4), (4, 3)): 4,
((1, 4), (2, 3), (3, 1), (4, 1)): 4,
((1, 4), (2, 3), (3, 1), (4, 2)): 4,
((1, 4), (2, 3), (3, 1), (4, 3)): 4,
((1, 4), (2, 3), (3, 2), (4, 1)): 2,
((1, 4), (2, 3), (3, 2), (4, 2)): 4,
((1, 4), (2, 3), (3, 2), (4, 3)): 4,
((1, 4), (2, 3), (3, 4), (4, 1)): 4,
((1, 4), (2, 3), (3, 4), (4, 2)): 4,
((1, 4), (2, 3), (3, 4), (4, 3)): 4,
((1, 4), (2, 4), (3, 1), (4, 1)): 4,
((1, 4), (2, 4), (3, 1), (4, 2)): 4,
((1, 4), (2, 4), (3, 1), (4, 3)): 4,
((1, 4), (2, 4), (3, 2), (4, 1)): 4,
((1, 4), (2, 4), (3, 2), (4, 2)): 4,
((1, 4), (2, 4), (3, 2), (4, 3)): 4,
((1, 4), (2, 4), (3, 4), (4, 1)): 4,
((1, 4), (2, 4), (3, 4), (4, 2)): 4,
((1, 4), (2, 4), (3, 4), (4, 3)): 4}
The maximum number of edges possible in a single graph with 'n' vertices is nC2 where nC2 = n(n – 1)/2. The number of simple graphs possible with 'n' vertices = 2nc2 = 2n(n-1)/2.
If you have N nodes, there are N - 1 directed edges than can lead from it (going to every other node). Therefore, the maximum number of edges is N * (N - 1) .
Before looking at the probabilities, let us look at a similar problem in combinatorics, i.e. given the rules of the graphs and N
nodes, how many graphs can be attained when the largest connected component is S
.
For the rest on this answer I will use the following notation:
C(S=s|N)
notes the total count of possible graphs with N
nodes and N
edges as defined in the question with the largest connects component is S=s
.
Examples:
T(s)
notes the number of graphs with s
nodes and s
edges as defined in the question where all nodes are connected. T(s) = C(S=s|s)
.
Examples
T(2) = 1
T(3) = 8
Some key observation to lay out the stages of the solution:
C(S=N-1|N) = 0
since the last node must be connected to itself. This is however not a legal graph and therefore the count for this case is 0
C(S=2|N)
is different than 0
only for even values of N
. If N
is odd, there has to be a node connected to itself and again, this is illegal.C(S=2|N) = (N choose 2) * C(S=2|N-2)
. This also correlates with 2. Since this recursion will eventually reach C(S=2|1) =0 and the entire expression will be 0
for odd N
values.C(S=2|N) = (N choose 2) * T(2) * C(S=2|N-2)
since T(2) = 1
this holds.C(S=s|N) = (N choose s) * T(s) * [C(S=2|N-s) + C(S=3|N-s) + ... + C(S=s|N-s)]
Note that these expressions include double counts, we will get to that later.
This means that we found a recursive pattern to compute the number of graph (up to double counts), all we need is a method to compute T(s).
For a N nodes N edges graph as defined in the question, T(N) can be computed via three components:
(N choose N-1) * T(N-1) * (N-1)
: The fully connected structure has a N-1 connected graph and the new node is connected to either of the N-1 nodes creating a N connected graph. This section contains double counts.Therefore we can compute T(s)
as follows (the explanation for the double counts in this case is too complex for me to write down an explanation):
We note that for T(2) and T(3) we do not have double counts.
Examples
T(3) = (3 choose 2)*T(2)*2 + 2! = 3*1*2+2=8
T(4) = (4 choose 3)*T(3)*3 + 3! - (4 choose 2)*T(2)*2^2 = 4×8×3 +6 -6*1*4=78
T(5) = (5 choose 4)*T(4)*4 + 4! - (5 choose 3)*T(3)*3^2 + (5 choose 2)*T(2)*2^3 = 5*78*4 +24 -10*8*9 +10*1*2^3 = 944
The only thing left is to find a way to overcome double counts in key observation 5. This is done using the link above. After breaking N
to a different groups, we correct the double count by dividing the result by a factor to complete it.
Examples
N=6
nodes and we computeC(S=2|6) = (6 choose 2)*T(2)*C(S=2|4) = (6 choose 2)*T(2)*(4 choose 2)*T(2)
We can see that we divided 6
into three groups of 2
therefore we need to correct the double count by applying a factor of 3!
:
C(S=2|6) = (6 choose 2)*T(2)*(4 choose 2)*T(2)/(3!) = 15
N=9
nodes and we computeC(S=3|9) = (9 choose 3)*T(3)*[C(S=2|6) + C(S=3|6)]
We can see that the first component of the brackets, i.e. C(S=2|6)
was computed in example 1, and a factor of 3!
was divided from the result because 6
was divided into three groups of 2
. for the second component of the brackets we get:
C(S=3|9) = (9 choose 3)*T(3)*[15 + (6 choose 3)*T(3)*C(S=3|3)] = (9 choose 3)*T(3)*[15 + (6 choose 3)*T(3)*T(3)]
We see that 9
was divided into three groups of 3
and therefore we add a factor of 3!
to counter the double counts:
C(S=3|9) = (9 choose 3)*T(3)*[15 + (6 choose 3)*T(3)*T(3)/(3!)] = 153440
Everything is now set up to compute C(S=s|N)
for all s
valuees between 2
and N
. Then the probability can achieved by dividing by the sum of all graphs:
P(S=s|N) = C(S=s|N) / sum_s{C(S=s|N)}
And the expected value can be found via:
E[S|N] = sum_s{s * P(S=s|N)}
Below I will show a python implementation for the method derived above. I will compute the values including the double counts to keep the recursive property, thus allowing me to compute all the graph counts in O(N^2) and I will keep record of the splitting to know how to correct the double counts later on. The code for the counts with comments is as follows:
import numpy as np
from math import comb
from collections import Counter
class ExpectedSize:
def __init__(self, N):
self.N = N
self.counts = None
self.group_dict = None
self.corrected_counts = None
self.expected_size = None
def get_T(self):
"""
:param N: Number of nodes and edges
:return: A vector where the ith location contains the number of graphs with i nodes and i edges where all the nodes
are connected
"""
T = np.zeros(self.N + 1)
for ii in range(len(T)):
if ii < 2:
T[ii] = 0
elif ii == 2:
T[ii] = 1
else:
n_vec = np.arange(1, ii-2)
sign_vec = (-1) ** n_vec
choose_vec = [comb(ii, ii-1-jj) for jj in n_vec]
t_vec = T[ii-2:1:-1]
power_vec = (ii -1 - n_vec) ** (n_vec + 1)
T[ii] = ii * T[ii - 1] * (ii - 1) + np.math.factorial(ii - 1) + np.sum(sign_vec * choose_vec * t_vec * power_vec) # X choose (X-1) = X
return T
def get_counts(self):
# Init
T_s = self.get_T()
# counts = np.zeros((self.N + 1, self.N + 1))
# counts[:, 0] = 1
# counts[2, 2] = 1
counts = {(2,2): [1]}
for ii in range(self.N + 1):
counts[(ii, 0)] = [1]
group_dict = {(2,2): [[2]]}
for NN in range(3, self.N + 1):
for ss in range(2, NN+1):
# Computing C(S = ss | ii) without the double-counting correction
init_term = comb(NN, ss) * T_s[ss]
# creating new split list
split_list = []
count_list = []
for sigma in range(2, ss+1):
key = (sigma, NN-ss)
if NN-ss == 0 and sigma > 2:
break
# Getting sum components
temp_components = counts.get((sigma, NN-ss), [0])
# Updating split counts
group_count = group_dict.get(key, [[]])
# appending split list to the total list if needed
for gg in group_count:
if temp_components[0] > 0:
split_list.append(gg + [ss])
group_dict[(ss, NN)] = split_list
count_list += [temp_components[0]]
counts[(ss, NN)] = [init_term * cc for cc in count_list] if len(count_list) > 0 else [0]
self.counts = counts
self.group_dict = group_dict
def correct_double_counts(self):
counts = self.counts
group_dict = self.group_dict
corrected_counts = {}
for key in counts:
group_count = group_dict.get(key, [[]])
sum_components = counts[key]
corrected_temp = []
for ii, sum_component in enumerate(sum_components):
counter_dict = Counter(group_count[ii])
corrected_temp += [sum_component]
for split_key in counter_dict:
corrected_temp[-1] = corrected_temp[-1]/ np.math.factorial(counter_dict[split_key])
corrected_counts[key] = sum(corrected_temp)
self.corrected_counts = corrected_counts
def get_expected_size(self):
cumsum = 0
expected = 0
for ii in range(1, self.N+1):
key = (ii, self.N)
cumsum += self.corrected_counts[key]
expected += ii * self.corrected_counts[key]
self.expected_size = expected / cumsum
Running the following code for N=9
results in:
exp_size = ExpectedSize(9)
exp_size.get_counts()
exp_size.correct_double_counts()
print(exp_size.corrected_counts)
(2, 2): 1.0,
(2, 3): 0, (3, 3): 8.0,
(2, 4): 3.0, (3, 4): 0, (4, 4): 78.0,
(2, 5): 0, (3, 5): 80.0, (4, 5): 0, (5, 5): 944.0,
(2, 6): 15.0, (3, 6): 640.0, (4, 6): 1170.0, (5, 6): 0, (6, 6): 13800.0,
(2, 7): 0, (3, 7): 840.0, (4, 7): 21840.0, (5, 7): 19824.0, (6, 7): 0, (7, 7): 237432.0,
(2, 8): 105.0, (3, 8): 17920.0, (4, 8): 229320.0, (5, 8): 422912.0, (6, 8): 386400.0, (7, 8): 0, (8, 8): 4708144.0,
(2, 9): 0, (3, 9): 153440.0, (4, 9): 786240.0, (5, 9): 9634464.0, (6, 9): 9273600.0, (7, 9): 8547552.0, (8, 9): 0, (9, 9): 105822432.0}
We can see that the counts match the brute force computations of OP (the keys are (S,N)
.
Now the expectation computation can be done with ease using the get_expected_size
function
Examples
N=2
--> E[S|N] = 2
N=3
--> E[S|N] = 3
N=4
--> E[S|N] = 3.925925925925926
; 126 / 27 = 3.925925926
N=5
--> E[S|N] = 4.84375
; 155 / 32 = 4.84375
N=6
--> E[S|N] = 5.72352
; 17886 / 3125 = 5.72352
N=7
--> E[S|N] = 6.612311385459534
; 38563 / 5832 = 6.612311385
N=8
--> E[S|N] = 7.471092584115219
; 6152766 / 823543 = 7.471092584
N=9
--> E[S|N] = 8.342072010040283
; 17494593 / 2097152 = 8.34207201
N=10
--> E[S|N] = 9.189007166835722
N=11
--> E[S|N] = 10.047272752
N=12
--> E[S|N] = 10.891595346262637
And we can see that the overall results are accurate w.r.t the brute force results up to precision point of Python. I this is a sufficient solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With