Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find all subarrays of fixed length with a given ranking

I have an array of numbers, for example:

A = [1, 5, 2, 4, 3]

and an array that determines a rank, for example:

B = [0, 2, 1]

My goal is to find all the subarrays of A that "obey" the rank B. If a subarray obeys the rank, that means that the i-th smallest element of the subarray must have B[i] as its (subarray) index. So for a subarray to match, the smallest element within it must be in position 0, the second smallest must be in position 2, and the biggest element must be in position 1.

So for example here, there are two subarrays of A that match the ranking: [1, 5, 2] (because A[0] < A[2] < A[1]) and [2, 4, 3].

So far, I've managed to find a solution that has an O(mn) (m is len(A) and n is len(B)) time complexity, it iterates over all the subarrays of length 3 and verifies if they are correctly ordered:

A = [1, 5, 2, 4, 3]
B = [0, 2, 1]
m = len(A)
n = len(B)
for i in range(m - n + 1):
    current_subarray = A[i:i + n]
    # we now do n - 1 comparaisons to check whether the subarray is correctly ordered.
    for B_index in range(n - 1):
        if current_subarray[B[B_index]] > current_subarray[B[B_index + 1]]:
            break
    else:
        print("Subarray found:", current_subarray)

Result:

Subarray found: [1, 5, 2]
Subarray found: [2, 4, 3]

This works, but I was wondering if there was a better optimized algorithm (better than O(mn)) to fulfill this task.

like image 833
ping guo Avatar asked Jan 05 '19 17:01

ping guo


1 Answers

Instead of iterating over B to compare ranks, you could use scipy.stats.rankdata to get the ranks directly:

from scipy.stats import rankdata

A = [1, 5, 2, 4, 3]
B = [0, 2, 1]

m = len(A)
n = len(B)

for i in range(m - n + 1):
    current_subarray = A[i:i + n]

    ranked_numbers = (rankdata(current_subarray).astype(int) - 1).tolist()
    if ranked_numbers == B:
        print("Subarray found:", current_subarray)

# Subarray found: [1, 5, 2]
# Subarray found: [2, 4, 3]

Note: rankdata() starts ranks from 1 instead of 0, which is why the above minuses 1 from every element in the numpy array.

like image 159
RoadRunner Avatar answered Oct 14 '22 16:10

RoadRunner