Let's consider, there are two arrays <code>I</code> and <code>J</code> which determine the neighbor pairs: <pre class="prettyprint"><code>I = np.array([0, 0, 1, 2, 2, 3]) J = np.array([1, 2, 0, 0, 3, 2]) </code></pre> Which means element <code>0</code> has two neighbors <code>1</code> and <code>2</code>. Element <code>1</code> has only <code>0</code> as a neighbor and so on. What is the most efficient way to create arrays of all neighbor triples <code>I'</code>, <code>J'</code>, <code>K'</code> such that <code>j</code> is neighbor of <code>i</code> and <code>k</code> is neighbor of <code>j</code> given the condition <code>i</code>, <code>j</code>, and <code>k</code> are different elements (<code>i != j != k</code>)? <pre class="prettyprint"><code>Ip = np.array([0, 0, 2, 3]) Jp = np.array([2, 2, 0, 2]) Kp = np.array([0, 3, 1, 0]) </code></pre> Of course, one way is to loop over each element. Is there a more efficient algorithm? (working with 10-500 million elements)

I would go with a very simple approach and use pandas (<code>I</code> and <code>J</code> are your numpy arrays): <pre class="prettyprint"><code>import pandas as pd df1 = pd.DataFrame({'I': I, 'J': J}) df2 = df1.rename(columns={'I': 'K', 'J': 'I'}) result = pd.merge(df2, df1, on='I').query('K != J') </code></pre> The advantage is that <code>pandas.merge</code> relies on a very fast underlying numerical implementation. Also, you can make the computation even faster for example by merging using indexes. To reduce the memory that this approach needs, it would be probably very useful to reduce the size of <code>df1</code> and <code>df2</code> before merging them (for example, by changing the dtype of their columns to something that suits your need). Here is an example of how to optimize speed and memory of the computation: <pre class="prettyprint"><code>from timeit import timeit import numpy as np import pandas as pd I = np.random.randint(0, 10000, 1000000) J = np.random.randint(0, 10000, 1000000) df1_64 = pd.DataFrame({'I': I, 'J': J}) df1_32 = df1_64.astype('int32') df2_64 = df1_64.rename(columns={'I': 'K', 'J': 'I'}) df2_32 = df1_32.rename(columns={'I': 'K', 'J': 'I'}) timeit(lambda: pd.merge(df2_64, df1_64, on='I').query('K != J'), number=1) # 18.84 timeit(lambda: pd.merge(df2_32, df1_32, on='I').query('K != J'), number=1) # 9.28 </code></pre>

There is no particularly magic algorithm to generate all of the triples. You can avoid re-fetching a node's neighbors by an orderly search, but that's about it. <ul> <li>Make an empty list, N, of nodes to check.</li> <li>Add some start node, S, to N</li> <li>While N is not empty <ul> <li>Pop a node off the list; call it A.</li> <li>Make a set of its neighbors, A'.</li> <li>for each neighbor B of A <ul> <li>for each element <code>a</code> of A' <ul> <li>Generate the triple (a, A, B)</li> </ul> </li> <li>Add B to the list of nodes to check, if it has not already been checked.</li> </ul> </li> </ul> </li> </ul> Does that help? There are still several details to handle in the algorithm above, such as avoiding duplicate generation, and fine points of moving through cliques.

Most efficient way to find neighbors of neighbors in python

Tags:

python

python-3.x

tree

numpy

Let's consider, there are two arrays I and J which determine the neighbor pairs:

I = np.array([0, 0, 1, 2, 2, 3])
J = np.array([1, 2, 0, 0, 3, 2])

Which means element 0 has two neighbors 1 and 2. Element 1 has only 0 as a neighbor and so on.

What is the most efficient way to create arrays of all neighbor triples I', J', K' such that j is neighbor of i and k is neighbor of j given the condition i, j, and k are different elements (i != j != k)?

Ip = np.array([0, 0, 2, 3])
Jp = np.array([2, 2, 0, 2])
Kp = np.array([0, 3, 1, 0])

Of course, one way is to loop over each element. Is there a more efficient algorithm? (working with 10-500 million elements)

717

asked Mar 02 '21 07:03

Roy

3 Answers

I would go with a very simple approach and use pandas (I and J are your numpy arrays):

import pandas as pd

df1 = pd.DataFrame({'I': I, 'J': J})
df2 = df1.rename(columns={'I': 'K', 'J': 'I'})

result = pd.merge(df2, df1, on='I').query('K != J')

The advantage is that pandas.merge relies on a very fast underlying numerical implementation. Also, you can make the computation even faster for example by merging using indexes.

To reduce the memory that this approach needs, it would be probably very useful to reduce the size of df1 and df2 before merging them (for example, by changing the dtype of their columns to something that suits your need).

Here is an example of how to optimize speed and memory of the computation:

from timeit import timeit
import numpy as np
import pandas as pd

I = np.random.randint(0, 10000, 1000000)
J = np.random.randint(0, 10000, 1000000)

df1_64 = pd.DataFrame({'I': I, 'J': J})
df1_32 = df1_64.astype('int32')
df2_64 = df1_64.rename(columns={'I': 'K', 'J': 'I'})
df2_32 = df1_32.rename(columns={'I': 'K', 'J': 'I'})

timeit(lambda: pd.merge(df2_64, df1_64, on='I').query('K != J'), number=1)
# 18.84
timeit(lambda: pd.merge(df2_32, df1_32, on='I').query('K != J'), number=1)
# 9.28

answered Oct 25 '22 15:10

Riccardo Bucco

There is no particularly magic algorithm to generate all of the triples. You can avoid re-fetching a node's neighbors by an orderly search, but that's about it.

Make an empty list, N, of nodes to check.
Add some start node, S, to N
While N is not empty
- Pop a node off the list; call it A.
- Make a set of its neighbors, A'.
- for each neighbor B of A
  - for each element a of A'
    - Generate the triple (a, A, B)
  - Add B to the list of nodes to check, if it has not already been checked.

Does that help? There are still several details to handle in the algorithm above, such as avoiding duplicate generation, and fine points of moving through cliques.

answered Oct 25 '22 16:10

Prune

What you are looking for is all paths of length 3 in the graph. You can achieve this simply with the following recursive algorithm:

import networkx as nx

def findPaths(G,u,n):
    """Returns a list of all paths of length `n` starting at vertex `u`."""
    if n==1:
        return [[u]]
    paths = [[u]+path for neighbor in G.neighbors(u) for path in findPaths(G,neighbor,n-1) if u not in path]
    return paths

# Generating graph
vertices = np.unique(I)
edges = list(zip(I,J))
G = nx.Graph()
G.add_edges_from(edges)

# Grabbing all 3-paths
paths = [path for v in vertices for path in findPaths(G,v,3)]

paths
>>> [[0, 2, 3], [1, 0, 2], [2, 0, 1], [3, 2, 0]]

answered Oct 25 '22 16:10

iacob

Related questions
                            
                                How to melt a dataframe while doing some operation?
                            
                                How can I make a distance matrix with own metric using no loop?
                            
                                Does Pytest cache fixture data when called by multiple test functions?
                            
                                How to create sum of columns in Pandas based on a conditional of multiple columns?
                            
                                Plotting two dataframes obtained from a loop in the same graph Python
                            
                                AttributeError: 'NoneType' object has no attribute 'excluded_of'
                            
                                trying to find the current project id of the deployed python function in google cloud gives error
                            
                                How do I turn off the "Evaluating: plt.show() did not finish after 3.00s seconds." warning in the VsCode debugger?
                            
                                How to view opts for Holoviews with Bokeh in Python
                            
                                How to handle job cancelation in Slurm?
                            
                                How to find the range of dates from a datetime column in a dataframe?
                            
                                How can I combine two dataframes based on a column of lists in Pandas
                            
                                Close position Binance Futures with ccxt
                            
                                Sum negative row values with previous rows pandas
                            
                                Can I override fields from a Pydantic parent model to make them optional?
                            
                                Read .pptx file from s3
                            
                                Matplotlib figure '.supxlabel' does not work
                            
                                Unable to access the updated global variable's value
                            
                                How to get the pivot lines from two tab-separated files?
                            
                                Update XML with an SQL query

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With