For performance reasons I'd like to use the Python list insert() method. I will demonstrate why:
My final list is a 31k * 31k matrix:
w=31*10**3
h=31*10**3
distance_matrix = [[0 for x in range(w)] for y in range(h)]
I intent to update the matrix one iteration at the time:
for i in range(len(index)):
for j in range(len(index)):
distance_matrix[index[i]][index[j]] = k[0][i][j]
Obviously this doesn't perform well.
I'd rather like to start with an empty list and fill it up gradually, making the computation intense at the end of the process (and easy at the beginning):
distance_matrix = []
for i in range(len(index)):
for j in range(len(index)):
distance_matrix.insert([index[i]][index[j]], k[0][i][j])
But this multi-index or list-in-list insert doesn't seem to be possible.
How would you advise to proceed? I've also looked into numpy arrays, but without luck so far.
To be precise: updating the (ordered) large array of zeros index by index is the issue here. In a DataFrame I can use custom columns/indices, but that is not scalable in performance.
Additional information: I split up the entire original data matrix in parts to compute distance matrices in parallel. The issue in this process is to aggregate the distance matrix again with the computed values. The distance matrix/array is very large, therefore a simple list insert or edit takes very long.
I think this approach achieves what I had in mind:
distance_matrix = []
def dynamic_append(x,i,j,val):
if((len(x)-1)<i):
dif_x = i-len(x)+1
for k in range(dif_x):
x.append([])
dif_y = j-len(x[i])+1
for l in range(dif_y):
x[i].append([])
elif((len(x[i])-1)<j):
dif_y = j-len(x[i])+1
for l in range(dif_y):
x[i].append([])
x[i][j]=val
return(x)
for i in range(len(index)):
for j in range(len(index)):
distance_matrix=dynamic_append(distance_matrix,index[i],index[j],k[0][i][j])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With