Computing the smallest positive integer not covered by any of a set of intervals

Q: How do you find the smallest positive number in an array?

First sort the array. Then initialize a variable to 1 and using a loop scan through the array. Check the value of the variable if it matches the current array element, increment it if that is the case. The value of the variable after the loop is the smallest positive missing integer.

Q: How do you find the smallest positive integer in Python?

that, given an array A of N integers, returns the smallest positive integer (greater than 0) that does not occur in A. For example, given A = [1, 3, 6, 4, 1, 2], the function should return 5. Given A = [1, 2, 3], the function should return 4. Given A = [−1, −3], the function should return 1.

Tags:

python

algorithm

big-o

time-complexity

intervals

Someone posted this question here a few weeks ago, but it looked awfully like homework without prior research, and the OP promptly removed it after getting a few downvotes.

The question itself was rather interesting though, and I've been thinking about it for a week without finding a satisfying solution. Hopefully someone can help?

The question is as follows: given a list of N integer intervals, whose bounds can take any values from 0 to N³, find the smallest integer i such that i does not belong to any of the input intervals.

For example, if given the list [3,5] [2,8] [0,3] [10,13] (N = 4) , the algorithm should return 9.

The simplest solution that I can think of runs in O(n log(n)), and consists of three steps:

Sort the intervals by increasing lower bound
- If the smallest lower bound is > 0, return 0;
- Otherwise repeatedly merge the first interval with the second, until the first interval (say [a, b]) does not touch the second (say [c, d]) — that is, until b + 1 < c, or until there is only one interval.
Return b + 1

This simple solution runs in O(n log(n)), but the original poster wrote that the algorithm should run in O(n). That's trivial if the intervals are already sorted, but the example that the OP gave included unsorted intervals. I guess it must have something to do with the N³ bound, but I'm not sure what... Hashing? Linear time sorting? Ideas are welcome.

Here is a rough python implementation for the algorithm described above:

def merge(first, second):
    (a, b), (c, d) = first, second
    if c <= b + 1:
        return (a, max(b, d))
    else:
        return False

def smallest_available_integer(intervals):
    # Sort in reverse order so that push/pop operations are fast
    intervals.sort(reverse = True)

    if (intervals == [] or intervals[-1][0] > 0):
        return 0

    while len(intervals) > 1:
        first = intervals.pop()
        second = intervals.pop()

        merged = merge(first, second)
        if merged:
            print("Merged", first, "with", second, " -> ", merged)
            intervals.append(merged)
        else:
            print(first, "cannot be merged with", second)
            return first[1] + 1

print(smallest_available_integer([(3,5), (2,8), (0,3), (10,13)]))

Output:

Merged (0, 3) with (2, 8)  ->  (0, 8)
Merged (0, 8) with (3, 5)  ->  (0, 8)
(0, 8) cannot be merged with (10, 13)
9

859

asked Oct 10 '13 15:10

Clément

1 Answers

Elaborating on @mrip's comment: you can do this in O(n) time by using the exact algorithm you've described but changing how the sorting algorithm works.

Typically, radix sort uses base 2: the elements are divvied into two different buckets based on whether their bits are 0 or 1. Each round of radix sort takes time O(n), and there is one round per bit of the largest number. Calling that largest nunber U, this means the time complexity is O(n log U).

However, you can change the base of the radix sort to other bases. Using base b, each round takes time O(n + b), since it takes time O(b) to initialize and iterate over the buckets and O(n) time to distribute elements into the buckets. There are then log_b U rounds. This gives a runtime of O((n + b)log_b U).

The trick here is that since the maximum number U = n³, you can set b = n and use a base-n radix sort. The number of rounds is now log_n U = log_n n³ = 3 and each round takes O(n) time, so the total work to sort the numbers is O(n). More generally, you can sort numbers in the range [0, n^k) in time O(kn) for any k. If k is a fixed constant, this is O(n) time.

Combined with your original algorithm, this solves the problem in time O(n).

Hope this helps!

answered Nov 06 '22 04:11

templatetypedef

Related questions
                            
                                Create SQL table with correct column types from CSV
                            
                                SQLAlchemy with PostgreSQL and Full Text Search
                            
                                Sublime text: How to get the file name of the current view
                            
                                Run Python CGI Application on Heroku
                            
                                Django: override get_FOO_display()
                            
                                Using statements on either side of a python ternary conditional
                            
                                SQL Parsing library for Python [duplicate]
                            
                                Python Scraping JavaScript using Selenium and Beautiful Soup
                            
                                Why reset python global value doesn't take effect
                            
                                How to track the progress of individual tasks inside a group which forms the header to a chord in celery?
                            
                                How to crawl an entire website with Scrapy?
                            
                                wrapping struct with nested enum - reference in vector template
                            
                                Java vs Python HMAC-SHA256 Mismatch
                            
                                Refer to group inside group with Regex
                            
                                How can I turn a python 3.3 script into executable file? I found PyInstaller and py2exe, but both did not support 3.3
                            
                                PyCrypto : AssertionError("PID check failed. RNG must be re-initialized after fork(). Hint: Try Random.atfork()")
                            
                                Finding a minimal subarray of n integers of sum >= k in linear time
                            
                                How to hold key down with Selenium
                            
                                Using Python's Requests library to navigate webpages / Click buttons
                            
                                openpyxl create a function that references a cell in another sheet

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With