Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient way to find neighbors of neighbors in python

Let's consider, there are two arrays I and J which determine the neighbor pairs:

I = np.array([0, 0, 1, 2, 2, 3])
J = np.array([1, 2, 0, 0, 3, 2])

Which means element 0 has two neighbors 1 and 2. Element 1 has only 0 as a neighbor and so on.

What is the most efficient way to create arrays of all neighbor triples I', J', K' such that j is neighbor of i and k is neighbor of j given the condition i, j, and k are different elements (i != j != k)?

Ip = np.array([0, 0, 2, 3])
Jp = np.array([2, 2, 0, 2])
Kp = np.array([0, 3, 1, 0])

Of course, one way is to loop over each element. Is there a more efficient algorithm? (working with 10-500 million elements)

like image 717
Roy Avatar asked Mar 02 '21 07:03

Roy


People also ask

How do I find nodes of neighbors in Python?

Use the len() and list() functions together with the . neighbors() method to calculate the total number of neighbors that node n in graph G has. After iterating over all the nodes in G , return the set nodes .

What is a neighbor node?

Definition A. 1.3 (Neighbor nodes) Given a graph G = (N,E), two nodes u, v ∈ N are said to be neighbors, or adjacent nodes, if (u, v) ∈ E. If G is directed, we distinguish between incoming neighbors of u (those nodes v ∈ N such that (v, u) ∈ E) and outgoing neighbors of u (those nodes v ∈ N such that (u, v) ∈ E).


3 Answers

I would go with a very simple approach and use pandas (I and J are your numpy arrays):

import pandas as pd

df1 = pd.DataFrame({'I': I, 'J': J})
df2 = df1.rename(columns={'I': 'K', 'J': 'I'})

result = pd.merge(df2, df1, on='I').query('K != J')

The advantage is that pandas.merge relies on a very fast underlying numerical implementation. Also, you can make the computation even faster for example by merging using indexes.

To reduce the memory that this approach needs, it would be probably very useful to reduce the size of df1 and df2 before merging them (for example, by changing the dtype of their columns to something that suits your need).

Here is an example of how to optimize speed and memory of the computation:

from timeit import timeit
import numpy as np
import pandas as pd

I = np.random.randint(0, 10000, 1000000)
J = np.random.randint(0, 10000, 1000000)

df1_64 = pd.DataFrame({'I': I, 'J': J})
df1_32 = df1_64.astype('int32')
df2_64 = df1_64.rename(columns={'I': 'K', 'J': 'I'})
df2_32 = df1_32.rename(columns={'I': 'K', 'J': 'I'})

timeit(lambda: pd.merge(df2_64, df1_64, on='I').query('K != J'), number=1)
# 18.84
timeit(lambda: pd.merge(df2_32, df1_32, on='I').query('K != J'), number=1)
# 9.28
like image 53
Riccardo Bucco Avatar answered Oct 25 '22 15:10

Riccardo Bucco


There is no particularly magic algorithm to generate all of the triples. You can avoid re-fetching a node's neighbors by an orderly search, but that's about it.

  • Make an empty list, N, of nodes to check.
  • Add some start node, S, to N
  • While N is not empty
    • Pop a node off the list; call it A.
    • Make a set of its neighbors, A'.
    • for each neighbor B of A
      • for each element a of A'
        • Generate the triple (a, A, B)
      • Add B to the list of nodes to check, if it has not already been checked.

Does that help? There are still several details to handle in the algorithm above, such as avoiding duplicate generation, and fine points of moving through cliques.

like image 21
Prune Avatar answered Oct 25 '22 16:10

Prune


What you are looking for is all paths of length 3 in the graph. You can achieve this simply with the following recursive algorithm:

import networkx as nx

def findPaths(G,u,n):
    """Returns a list of all paths of length `n` starting at vertex `u`."""
    if n==1:
        return [[u]]
    paths = [[u]+path for neighbor in G.neighbors(u) for path in findPaths(G,neighbor,n-1) if u not in path]
    return paths

# Generating graph
vertices = np.unique(I)
edges = list(zip(I,J))
G = nx.Graph()
G.add_edges_from(edges)

# Grabbing all 3-paths
paths = [path for v in vertices for path in findPaths(G,v,3)]
paths
>>> [[0, 2, 3], [1, 0, 2], [2, 0, 1], [3, 2, 0]]
like image 38
iacob Avatar answered Oct 25 '22 16:10

iacob