Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient way to sort an array with a condition

I have a numpy array:

a = np.array(["dcba", "abc", "bca", "bcda", "tda", "a"])

Now I have a vectorized Levenshtein edit distance function which measures distance of given string with given array, for example, for string ab:

l_distv("ab", a)

returns:

array([3, 1, 3, 4, 3, 1])

I'd like to sort an array in a way so that any element with edit distance smaller than 2 moves to first positions, while the rest are moved behind them without changing their order. So result would be:

array(["abc", "a", "dcba", "bca", "bcda", "tda"])

I've done this, but it's pretty ugly, I assume there is a more efficient way.

like image 541
enedene Avatar asked Jun 19 '26 10:06

enedene


2 Answers

Add the elements and the edit distances in a dictionary

dictionary = dict(zip(a,array))

then sort the dictionary according to the edit distance

sorted_dictionary = sorted(dictionary.items(), key=operator.itemgetter(1))
like image 52
bkaf Avatar answered Jun 21 '26 05:06

bkaf


Assuming that those distance values are stored in an array dists, here's one approach -

sort_idx = dists.argsort()
mask = dists < 2
out = np.concatenate((a[sort_idx[mask[sort_idx]]],a[~mask]))

Sample run -

In [144]: a
Out[144]: 
array(['dcba', 'abc', 'bca', 'bcda', 'tda', 'a'], 
      dtype='|S4')

In [145]: dists
Out[145]: array([3, 1, 3, 4, 3, 0]) # Different from listed sample to 
                                    # show how it handles sorting

In [146]: sort_idx = dists.argsort()

In [147]: mask = dists < 2

In [148]: np.concatenate((a[sort_idx[mask[sort_idx]]],a[~mask]))
Out[148]: 
array(['a', 'abc', 'dcba', 'bca', 'bcda', 'tda'], 
      dtype='|S4')

The above approach concatenates two indexed parts of a, which might not be very efficient in terms of runtime. So, with performance in mind, you can create a concatenated indices array instead and then index into a with it in one-go. Thus, the last line from previous implementation has to be changed, like so -

out = a[np.concatenate((sort_idx[mask[sort_idx]],np.where(~mask)[0]))]
like image 28
Divakar Avatar answered Jun 21 '26 06:06

Divakar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!