Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace data of an array by two values of a second array

I have two numpy arrays "Elements" and "nodes". My aim is to gather some data of these arrays. I need to replace "Elements" data of the two last columns by the two coordinates contains in "nodes" array. The two arrays are very huge, I have to automate it.

This posts refers to an old one: Replace data of an array by 2 values of a second array

with a difference that arrays are very huge (Elements: (3342558,5) and nodes: (581589,4)) and the previous way out does not work.

An example :

    import numpy as np
    
    Elements = np.array([[1.,11.,14.],[2.,12.,13.]])
    
    nodes = np.array([[11.,0.,0.],[12.,1.,1.],[13.,2.,2.],[14.,3.,3.]])
    
    results = np.array([[1., 0., 0., 3., 3.],
    [2., 1., 1., 2., 2.]])

The previous way out proposed by hpaulj

    e = Elements[:,1:].ravel().astype(int)
    n=nodes[:,0].astype(int)
    
    I, J = np.where(e==n[:,None])
    
    results = np.zeros((e.shape[0],2),nodes.dtype)
    results[J] = nodes[I,:1]
    results = results.reshape(2,4)

But with huge arrays, this script does not work:
DepreciationWarning: elementwise comparison failed; this will raise an error in the future...

like image 213
user3601754 Avatar asked Oct 20 '20 15:10

user3601754


2 Answers

Most of the game would be to figure out the corresponding matching indices from Elements in nodes.

Approach #1

Since it seems you are open to conversion to integer, let's assume we could take them as integers. With that, we could use an array-assignment + mapping based method, as shown below :

ar = Elements.astype(int)
a = ar[:,1:].ravel()
nd = nodes[:,0].astype(int)

n = a.max()+1
# for generalized case of neagtive ints in a or nodes having non-matching values:
# n = max(a.max()-min(0,a.min()), nd.max()-min(0,nd.min()))+1

lookup = np.empty(n, dtype=int)
lookup[nd] = np.arange(len(nd))
indices = lookup[a]

nc = (Elements.shape[1]-1)*(nodes.shape[1]-1) # 4 for given setup
out = np.concatenate((ar[:,0,None], nodes[indices,1:].reshape(-1,nc)),axis=1)

Approach #2

We could also use np.searchsorted to get those indices.

For nodes having rows sorted based on first col and matching case, we can simply use :

indices = np.searchsorted(nd, a)

For not-necessarily sorted case and matching case :

sidx = nd.argsort()
idx = np.searchsorted(nd, a, sorter=sidx)
indices = sidx[idx]

For non-matching case, use an invalid bool array :

invalid = idx==len(nd)
idx[invalid] = 0
indices = sidx[idx]

Approach #3

Another with concatenation + sorting -

b = np.concatenate((nd,a))
sidx = b.argsort(kind='stable')

n = len(nd)
v = sidx<n
counts = np.diff(np.flatnonzero(np.r_[v,True]))
r = np.repeat(sidx[v], counts)

indices = np.empty(len(a), dtype=int)
indices[sidx[~v]-n] = r[sidx>=n]

To detect non-matching ones, use :

nd[indices] != a

Port the idea here to numba :

from numba import njit

def numba1(Elements, nodes):
    a = Elements[:,1:].ravel()
    nd = nodes[:,0]
    b = np.concatenate((nd,a))
    sidx = b.argsort(kind='stable')
    
    n = len(nodes)        
    ncols = Elements.shape[1]-1
    size = nodes.shape[1]-1        
    dt = np.result_type(Elements.dtype, nodes.dtype)
    nc = ncols*size
    
    out = np.empty((len(Elements),1+nc), dtype=dt)
    out[:,0] = Elements[:,0]
    return numba1_func(out, sidx, nodes, n, ncols, size)

@njit
def numba1_func(out, sidx, nodes, n, ncols, size):
    N = len(sidx)    
    for i in range(N):
        if sidx[i]<n:
            cur_id = sidx[i]
            continue
        else:
            idx = sidx[i]-n        
            row = idx//ncols
            col = idx-row*ncols        
            cc = col*size+1
            for ii in range(size):
                out[row, cc+ii] = nodes[cur_id,ii+1]
    return out
like image 73
Divakar Avatar answered Oct 08 '22 19:10

Divakar


Would you consider using pandas?

import pandas as pd
Elements = np.array([[1.,11.,14.],[2.,12.,13.]])
nodes = np.array([[11.,0.,0.],[12.,1.,1.],[13.,2.,2.],[14.,3.,3.]])

df_elements = pd.DataFrame(Elements,columns = ['idx','node1','node2'])
df_nodes = pd.DataFrame(nodes, columns = ['node_id','x','y'])

#Double merge to get the coordinates from df_nodes
results = df_elements.merge(df_nodes, left_on = 'node1', right_on="node_id", how='left').merge(df_nodes, left_on="node2",right_on = "node_id", how='left')[['idx',"x_x",'y_x','x_y','y_y']].values

Output

array([[1., 0., 0., 3., 3.],
       [2., 1., 1., 2., 2.]])
like image 40
Sebastien D Avatar answered Oct 08 '22 18:10

Sebastien D