I have two numpy arrays "Elements" and "nodes". My aim is to gather some data of these arrays. I need to replace "Elements" data of the two last columns by the two coordinates contains in "nodes" array. The two arrays are very huge, I have to automate it.
This posts refers to an old one: Replace data of an array by 2 values of a second array
with a difference that arrays are very huge (Elements: (3342558,5) and nodes: (581589,4)) and the previous way out does not work.
An example :
import numpy as np
Elements = np.array([[1.,11.,14.],[2.,12.,13.]])
nodes = np.array([[11.,0.,0.],[12.,1.,1.],[13.,2.,2.],[14.,3.,3.]])
results = np.array([[1., 0., 0., 3., 3.],
[2., 1., 1., 2., 2.]])
The previous way out proposed by hpaulj
e = Elements[:,1:].ravel().astype(int)
n=nodes[:,0].astype(int)
I, J = np.where(e==n[:,None])
results = np.zeros((e.shape[0],2),nodes.dtype)
results[J] = nodes[I,:1]
results = results.reshape(2,4)
But with huge arrays, this script does not work:DepreciationWarning: elementwise comparison failed; this will raise an error in the future
...
Most of the game would be to figure out the corresponding matching indices from Elements
in nodes
.
Approach #1
Since it seems you are open to conversion to integer, let's assume we could take them as integers. With that, we could use an array-assignment
+ mapping
based method, as shown below :
ar = Elements.astype(int)
a = ar[:,1:].ravel()
nd = nodes[:,0].astype(int)
n = a.max()+1
# for generalized case of neagtive ints in a or nodes having non-matching values:
# n = max(a.max()-min(0,a.min()), nd.max()-min(0,nd.min()))+1
lookup = np.empty(n, dtype=int)
lookup[nd] = np.arange(len(nd))
indices = lookup[a]
nc = (Elements.shape[1]-1)*(nodes.shape[1]-1) # 4 for given setup
out = np.concatenate((ar[:,0,None], nodes[indices,1:].reshape(-1,nc)),axis=1)
Approach #2
We could also use np.searchsorted
to get those indices
.
For nodes having rows sorted based on first col and matching case, we can simply use :
indices = np.searchsorted(nd, a)
For not-necessarily sorted case and matching case :
sidx = nd.argsort()
idx = np.searchsorted(nd, a, sorter=sidx)
indices = sidx[idx]
For non-matching case, use an invalid bool array :
invalid = idx==len(nd)
idx[invalid] = 0
indices = sidx[idx]
Approach #3
Another with concatenation
+ sorting
-
b = np.concatenate((nd,a))
sidx = b.argsort(kind='stable')
n = len(nd)
v = sidx<n
counts = np.diff(np.flatnonzero(np.r_[v,True]))
r = np.repeat(sidx[v], counts)
indices = np.empty(len(a), dtype=int)
indices[sidx[~v]-n] = r[sidx>=n]
To detect non-matching ones, use :
nd[indices] != a
Port the idea here to numba
:
from numba import njit
def numba1(Elements, nodes):
a = Elements[:,1:].ravel()
nd = nodes[:,0]
b = np.concatenate((nd,a))
sidx = b.argsort(kind='stable')
n = len(nodes)
ncols = Elements.shape[1]-1
size = nodes.shape[1]-1
dt = np.result_type(Elements.dtype, nodes.dtype)
nc = ncols*size
out = np.empty((len(Elements),1+nc), dtype=dt)
out[:,0] = Elements[:,0]
return numba1_func(out, sidx, nodes, n, ncols, size)
@njit
def numba1_func(out, sidx, nodes, n, ncols, size):
N = len(sidx)
for i in range(N):
if sidx[i]<n:
cur_id = sidx[i]
continue
else:
idx = sidx[i]-n
row = idx//ncols
col = idx-row*ncols
cc = col*size+1
for ii in range(size):
out[row, cc+ii] = nodes[cur_id,ii+1]
return out
Would you consider using pandas
?
import pandas as pd
Elements = np.array([[1.,11.,14.],[2.,12.,13.]])
nodes = np.array([[11.,0.,0.],[12.,1.,1.],[13.,2.,2.],[14.,3.,3.]])
df_elements = pd.DataFrame(Elements,columns = ['idx','node1','node2'])
df_nodes = pd.DataFrame(nodes, columns = ['node_id','x','y'])
#Double merge to get the coordinates from df_nodes
results = df_elements.merge(df_nodes, left_on = 'node1', right_on="node_id", how='left').merge(df_nodes, left_on="node2",right_on = "node_id", how='left')[['idx',"x_x",'y_x','x_y','y_y']].values
Output
array([[1., 0., 0., 3., 3.],
[2., 1., 1., 2., 2.]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With