Let's consider, there are two arrays I
and J
which determine the neighbor pairs:
I = np.array([0, 0, 1, 2, 2, 3])
J = np.array([1, 2, 0, 0, 3, 2])
Which means element 0
has two neighbors 1
and 2
. Element 1
has only 0
as a neighbor and so on.
What is the most efficient way to create arrays of all neighbor triples I'
, J'
, K'
such that j
is neighbor of i
and k
is neighbor of j
given the condition i
, j
, and k
are different elements (i != j != k
)?
Ip = np.array([0, 0, 2, 3])
Jp = np.array([2, 2, 0, 2])
Kp = np.array([0, 3, 1, 0])
Of course, one way is to loop over each element. Is there a more efficient algorithm? (working with 10-500 million elements)
Use the len() and list() functions together with the . neighbors() method to calculate the total number of neighbors that node n in graph G has. After iterating over all the nodes in G , return the set nodes .
Definition A. 1.3 (Neighbor nodes) Given a graph G = (N,E), two nodes u, v ∈ N are said to be neighbors, or adjacent nodes, if (u, v) ∈ E. If G is directed, we distinguish between incoming neighbors of u (those nodes v ∈ N such that (v, u) ∈ E) and outgoing neighbors of u (those nodes v ∈ N such that (u, v) ∈ E).
I would go with a very simple approach and use pandas (I
and J
are your numpy arrays):
import pandas as pd
df1 = pd.DataFrame({'I': I, 'J': J})
df2 = df1.rename(columns={'I': 'K', 'J': 'I'})
result = pd.merge(df2, df1, on='I').query('K != J')
The advantage is that pandas.merge
relies on a very fast underlying numerical implementation. Also, you can make the computation even faster for example by merging using indexes.
To reduce the memory that this approach needs, it would be probably very useful to reduce the size of df1
and df2
before merging them (for example, by changing the dtype of their columns to something that suits your need).
Here is an example of how to optimize speed and memory of the computation:
from timeit import timeit
import numpy as np
import pandas as pd
I = np.random.randint(0, 10000, 1000000)
J = np.random.randint(0, 10000, 1000000)
df1_64 = pd.DataFrame({'I': I, 'J': J})
df1_32 = df1_64.astype('int32')
df2_64 = df1_64.rename(columns={'I': 'K', 'J': 'I'})
df2_32 = df1_32.rename(columns={'I': 'K', 'J': 'I'})
timeit(lambda: pd.merge(df2_64, df1_64, on='I').query('K != J'), number=1)
# 18.84
timeit(lambda: pd.merge(df2_32, df1_32, on='I').query('K != J'), number=1)
# 9.28
There is no particularly magic algorithm to generate all of the triples. You can avoid re-fetching a node's neighbors by an orderly search, but that's about it.
a
of A'
Does that help? There are still several details to handle in the algorithm above, such as avoiding duplicate generation, and fine points of moving through cliques.
What you are looking for is all paths of length 3 in the graph. You can achieve this simply with the following recursive algorithm:
import networkx as nx
def findPaths(G,u,n):
"""Returns a list of all paths of length `n` starting at vertex `u`."""
if n==1:
return [[u]]
paths = [[u]+path for neighbor in G.neighbors(u) for path in findPaths(G,neighbor,n-1) if u not in path]
return paths
# Generating graph
vertices = np.unique(I)
edges = list(zip(I,J))
G = nx.Graph()
G.add_edges_from(edges)
# Grabbing all 3-paths
paths = [path for v in vertices for path in findPaths(G,v,3)]
paths
>>> [[0, 2, 3], [1, 0, 2], [2, 0, 1], [3, 2, 0]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With