Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find longest quasi-constant sub-sequence of a sequence

I had this test earlier today, and I tried to be too clever and hit a road block. Unfortunately I got stuck in this mental rut and wasted too much time, failing this portion of the test. I solved it afterward, but maybe y'all can help me get out of the initial rut I was in.

Problem definition:

An unordered and non-unique sequence A consisting of N integers (all positive) is given. A subsequence of A is any sequence obtained by removing none, some or all elements from A. The amplitude of a sequence is the difference between the largest and the smallest element in this sequence. The amplitude of the empty subsequence is assumed to be 0.

For example, consider the sequence A consisting of six elements such that:

A[0] = 1
A[1] = 7
A[2] = 6
A[3] = 2
A[4] = 6
A[5] = 4

A subsequence of array A is called quasi-constant if its amplitude does not exceed 1. In the example above, the subsequences [1,2], [6,6], and [6,6,7] are quasi-constant. Subsequence [6, 6, 7] is the longest possible quasi-constant subsequence of A.

Now, find a solution that, given a non-empty zero-indexed array A consisting of N integers, returns the length of the longest quasi-constant subsequence of array A. For example, given sequence A outlined above, the function should return 3, as explained.

Now, I solved this in python 3.6 after the fact using a sort-based method with no classes (my code is below), but I didn't initially want to do that as sorting on large lists can be very slow. It seemed this should have a relatively simple formulation as a breadth-first tree-based class, but I couldn't get it right. Any thoughts on this?

My class-less sort-based solution:

def amp(sub_list):
    if len(sub_list) <2:
        return 0
    else:
        return max(sub_list) - min(sub_list)

def solution(A):
    A.sort()
    longest = 0
    idxStart = 0
    idxEnd = idxStart + 1
    while idxEnd <= len(A):
        tmp = A[idxStart:idxEnd]
        if amp(tmp) < 2:
            idxEnd += 1
            if len(tmp) > longest:
                longest = len(tmp)
        else:
            idxStart = idxEnd
            idxEnd = idxStart + 1
    return longest
like image 357
NichD Avatar asked Apr 01 '18 00:04

NichD


People also ask

How do you find the longest subsequence?

To find the LIS for a given array, we need to return max(L(i)) where 0 < i < n. Formally, the length of the longest increasing subsequence ending at index i, will be 1 greater than the maximum of lengths of all longest increasing subsequences ending at indices before i, where arr[j] < arr[i] (j < i).

How do you find the longest increasing subsequence with a difference K?

Fill the dp array with the maximum value between current dp indices and current max_length indices+1. Fill the max_length array with the maximum value between current dp indices and current max_length indices. Longest sub sequence length is the maximum value in dp array.

How do you find the length of the longest non decreasing subsequence?

Time: O(K * N/K * log(N/K)) = O(N * log(N/K)) , where N <= 10^5 is length of arr , K <= N . We have total K new arr, each array have up to N/K elements. We need O(M * logM) to find the longest non-decreasing subsequence of an array length M .

What is the maximum length of clear subsequence?

Explanation: The longest increasing subsequence is {3,10,20}. Explanation: The longest incresing subsequence is {2,3,7,101} or {2,3,7,18} or {2,5,7,101} or {2,5,7,18}. Can there be duplicate values present in the subsequence? (Ans: We need a strictly increasing sequence. So, No!)


2 Answers

As Andrey Tyukin pointed out, you can solve this problem in O(n) time, which is better than the O(n log n) time you'd likely get from either sorting or any kind of tree based solution. The trick is to use dictionaries to count the number of occurrences of each number in the input, and use the count to figure out the longest subsequence.

I had a similar idea to him, but I had though of a slightly different implementation. After a little testing, it looks like my approach is a quite a bit faster, so I'm posting it as my own answer. It's quite short!

from collections import Counter

def solution(seq):
    if not seq:     # special case for empty input sequence
        return 0
    counts = Counter(seq)
    return max(counts[x] + counts[x+1] for x in counts)

I suspect this is faster than Andrey's solution because the running time for both of our solutions really take O(n) + O(k) time where k is the number of distinct values in the input (and n is the total number of values in the input). My code handles the O(n) part very efficiently by handing off the sequence to the Counter constructor, which is implemented in C. It is likely to be a bit slower (on a per-item basis) to deal with the O(k) part, since it needs a generator expression. Andrey's code does the reverse (it runs slower Python code for the O(n) part, and uses faster builtin C functions for the O(k) part). Since k is always less than or equal to n (perhaps a lot less if the sequence has a lot of repeated values), my code is faster overall. Both solutions are still O(n) though, and both should be much better than sorting for large inputs.

like image 111
Blckknght Avatar answered Oct 16 '22 08:10

Blckknght


I don't know how BFS is supposed to help here.

Why not simply run once through the sequence and count how many elements every possible quasi-constant subsequence would have?

from collections import defaultdict

def longestQuasiConstantSubseqLength(seq):
  d = defaultdict(int)
  for s in seq:
    d[s] += 1
    d[s+1] += 1
  return max(d.values() or [0])

s = [1,7,6,2,6,4]

print(longestQuasiConstantSubseqLength(s))

prints:

3

as expected.

Explanation: Every non-constant quasi-constant subsequence is uniquely identified by the greatest number that it contains (there can be only two, take the greater one). Now, if you have a number s, it can either contribute to the quasi-constant subsequence that has s or s + 1 as the greatest number. So, just add +1 to the subsequences identified by s and s + 1. Then output the maximum of all counts.

You can't get it faster than O(n), because you have to look at every entry of the input sequence at least once.

like image 32
Andrey Tyukin Avatar answered Oct 16 '22 09:10

Andrey Tyukin