I have a numpy array:
a = np.array(["dcba", "abc", "bca", "bcda", "tda", "a"])
Now I have a vectorized Levenshtein edit distance function which measures distance of given string with given array, for example, for string ab:
l_distv("ab", a)
returns:
array([3, 1, 3, 4, 3, 1])
I'd like to sort an array in a way so that any element with edit distance smaller than 2 moves to first positions, while the rest are moved behind them without changing their order. So result would be:
array(["abc", "a", "dcba", "bca", "bcda", "tda"])
I've done this, but it's pretty ugly, I assume there is a more efficient way.
Add the elements and the edit distances in a dictionary
dictionary = dict(zip(a,array))
then sort the dictionary according to the edit distance
sorted_dictionary = sorted(dictionary.items(), key=operator.itemgetter(1))
Assuming that those distance values are stored in an array dists, here's one approach -
sort_idx = dists.argsort()
mask = dists < 2
out = np.concatenate((a[sort_idx[mask[sort_idx]]],a[~mask]))
Sample run -
In [144]: a
Out[144]:
array(['dcba', 'abc', 'bca', 'bcda', 'tda', 'a'],
dtype='|S4')
In [145]: dists
Out[145]: array([3, 1, 3, 4, 3, 0]) # Different from listed sample to
# show how it handles sorting
In [146]: sort_idx = dists.argsort()
In [147]: mask = dists < 2
In [148]: np.concatenate((a[sort_idx[mask[sort_idx]]],a[~mask]))
Out[148]:
array(['a', 'abc', 'dcba', 'bca', 'bcda', 'tda'],
dtype='|S4')
The above approach concatenates two indexed parts of a, which might not be very efficient in terms of runtime. So, with performance in mind, you can create a concatenated indices array instead and then index into a with it in one-go. Thus, the last line from previous implementation has to be changed, like so -
out = a[np.concatenate((sort_idx[mask[sort_idx]],np.where(~mask)[0]))]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With