Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate distance between arrays that contain NaN

Tags:

consider array1 and array2, with:

array1 = [a1 a2 NaN ... an]
array2 = [[NaN b2 b3 ... bn],
          [b21 NaN b23 ... b2n],
          ...]

Both arrays are numpy-arrays. There is an easy way to compute the Euclidean distance between array1and each row of array2:

EuclideanDistance = np.sqrt(((array1 - array2)**2).sum(axis=1))

What messes up this computation are the NaN values. Of course, I could easily replace NaN with some number. But instead, I want to do the following:

When I compare array1 with row_x of array2, I count the columns in which one of the arrays has NaN and the other doesn't. Let's assume the count is 3. I will then delete these columns from both arrays and compute the Euclidean distance between the two. In the end, I add a minus_value * count to the calculated distance.

Now, I cannot think of a fast and efficient way to do this. Can somebody help me?

Here are a few of my ideas:

minus = 1000
dist = np.zeros(shape=(array1.shape[0])) # this array will store the distance of array1 to each row of array2
array1 = np.repeat(array1, array2.shape[0], axis=0) # now array1 has the same dimensions as array2
for i in range(0, array1.shape[0]):
    boolarray = np.logical_or(np.isnan(array1[i]), np.isnan(array2[i]))
    count = boolarray.sum()
    deleteIdxs = boolarray.nonzero() # this should give the indices where boolarray is True
    dist[i] = np.sqrt(((np.delete(array1[i], deleteIdxs, axis=0) - np.delete(array2[i], deleteIdxs, axis=0))**2).sum(axis=0))
    dist[i] = dist[i] + count*minus

These lines look more than ugly to me, however. Also, I keep getting an index error: Apparently deleteIdxs contains an index that is out of range for array1. Don't know how this can even be.

like image 340
Luk Avatar asked May 08 '20 10:05

Luk


1 Answers

You can find all the indices with where the value is nan using:

indices_1 = np.isnan(array1)
indices_2 = np.isnan(array2)

Which you can combine to:

indices_total = indices_1 + indices_2

And you can keep all the not nan values using:

array_1_not_nan = array1[~indices_total]
array_2_not_nan = array2[~indices_total]
like image 133
Nathan Avatar answered Sep 30 '22 20:09

Nathan