Algorithm to determine if array contains n...n+m?

Note to everyone (stuff collected from comments)

This solution works under the assumption you can modify the array, then uses in-place Radix sort (which achieves O(N) speed).

Other mathy-solutions have been put forth, but I'm not sure any of them have been proved. There are a bunch of sums that might be useful, but most of them run into a blowup in the number of bits required to represent the sum, which will violate the constant extra space guarantee. I also don't know if any of them are capable of producing a distinct number for a given set of numbers. I think a sum of squares might work, which has a known formula to compute it (see Wolfram's)

New insight (well, more of musings that don't help solve it but are interesting and I'm going to bed):

So, it has been mentioned to maybe use sum + sum of squares. No one knew if this worked or not, and I realized that it only becomes an issue when (x + y) = (n + m), such as the fact 2 + 2 = 1 + 3. Squares also have this issue thanks to Pythagorean triples (so 3^2 + 4^2 + 25^2 == 5^2 + 7^2 + 24^2, and the sum of squares doesn't work). If we use Fermat's last theorem, we know this can't happen for n^3. But we also don't know if there is no x + y + z = n for this (unless we do and I don't know it). So no guarantee this, too, doesn't break - and if we continue down this path we quickly run out of bits.

In my glee, however, I forgot to note that you can break the sum of squares, but in doing so you create a normal sum that isn't valid. I don't think you can do both, but, as has been noted, we don't have a proof either way.

I must say, finding counterexamples is sometimes a lot easier than proving things! Consider the following sequences, all of which have a sum of 28 and a sum of squares of 140:

[1, 2, 3, 4, 5, 6, 7]
[1, 1, 4, 5, 5, 6, 6] 
[2, 2, 3, 3, 4, 7, 7]

I could not find any such examples of length 6 or less. If you want an example that has the proper min and max values too, try this one of length 8:

[1, 3, 3, 4, 4, 5, 8, 8]

Simpler approach (modifying hazzen's idea):

An integer array of length m contains all the numbers from n to n+m-1 exactly once iff

every array element is between n and n+m-1
there are no duplicates

(Reason: there are only m values in the given integer range, so if the array contains m unique values in this range, it must contain every one of them once)

If you are allowed to modify the array, you can check both in one pass through the list with a modified version of hazzen's algorithm idea (there is no need to do any summation):

For all array indexes i from 0 to m-1 do
1. If array[i] < n or array[i] >= n+m => RETURN FALSE ("value out of range found")
2. Calculate j = array[i] - n (this is the 0-based position of array[i] in a sorted array with values from n to n+m-1)
3. While j is not equal to i
  1. If list[i] is equal to list[j] => RETURN FALSE ("duplicate found")
  2. Swap list[i] with list[j]
  3. Recalculate j = array[i] - n
RETURN TRUE

I'm not sure if the modification of the original array counts against the maximum allowed additional space of O(1), but if it doesn't this should be the solution the original poster wanted.

By working with a[i] % a.length instead of a[i] you reduce the problem to needing to determine that you've got the numbers 0 to a.length - 1.

We take this observation for granted and try to check if the array contains [0,m).

Find the first node that's not in its correct position, e.g.

0 1 2 3 7 5 6 8 4 ;     the original dataset (after the renaming we discussed)
        ^
        `---this is position 4 and the 7 shouldn't be here

Swap that number into where it should be. i.e. swap the 7 with the 8:

0 1 2 3 8 5 6 7 4 ; 
        |     `--------- 7 is in the right place.
        `--------------- this is now the 'current' position

Now we repeat this. Looking again at our current position we ask:

"is this the correct number for here?"

If not, we swap it into its correct place.
If it is in the right place, we move right and do this again.

Following this rule again, we get:

0 1 2 3 4 5 6 7 8 ;     4 and 8 were just swapped

This will gradually build up the list correctly from left to right, and each number will be moved at most once, and hence this is O(n).

If there are dupes, we'll notice it as soon is there is an attempt to swap a number backwards in the list.

Why do the other solutions use a summation of every value? I think this is risky, because when you add together O(n) items into one number, you're technically using more than O(1) space.

Simpler method:

Step 1, figure out if there are any duplicates. I'm not sure if this is possible in O(1) space. Anyway, return false if there are duplicates.

Step 2, iterate through the list, keep track of the lowest and highest items.

Step 3, Does (highest - lowest) equal m ? If so, return true.

Any one-pass algorithm requires Omega(n) bits of storage.

Suppose to the contrary that there exists a one-pass algorithm that uses o(n) bits. Because it makes only one pass, it must summarize the first n/2 values in o(n) space. Since there are C(n,n/2) = 2^Theta(n) possible sets of n/2 values drawn from S = {1,...,n}, there exist two distinct sets A and B of n/2 values such that the state of memory is the same after both. If A' = S \ A is the "correct" set of values to complement A, then the algorithm cannot possibly answer correctly for the inputs

A A' - yes

B A' - no

since it cannot distinguish the first case from the second.

Q.E.D.

Vote me down if I'm wrong, but I think we can determine if there are duplicates or not using variance. Because we know the mean beforehand (n + (m-1)/2 or something like that) we can just sum up the numbers and square of difference to mean to see if the sum matches the equation (mn + m(m-1)/2) and the variance is (0 + 1 + 4 + ... + (m-1)^2)/m. If the variance doesn't match, it's likely we have a duplicate.

EDIT: variance is supposed to be (0 + 1 + 4 + ... + [(m-1)/2]^2)*2/m, because half of the elements are less than the mean and the other half is greater than the mean.

If there is a duplicate, a term on the above equation will differ from the correct sequence, even if another duplicate completely cancels out the change in mean. So the function returns true only if both sum and variance matches the desrired values, which we can compute beforehand.

Related questions
                            
                                Remove the last element from an array
                            
                                Remove object from a list of objects in python
                            
                                Iterate through a C array
                            
                                Convert ByteBuffer to byte array java [duplicate]
                            
                                PHP array_merge with numerical keys
                            
                                Array functions in jQuery
                            
                                Initializing Constant Static Array In Header File
                            
                                Move an array element to a new index in PHP
                            
                                str_replace with array
                            
                                Getting array from std:vector
                            
                                How to access a property of an object (stdClass Object) member/element of an array? [duplicate]
                            
                                how to perform max/mean pooling on a 2d array using numpy
                            
                                Compute intersection of two arrays in JavaScript [duplicate]
                            
                                Can Java use String as an index array key? (ex: array["a"]=1;)
                            
                                get first and last element in array
                            
                                Swapping the dimensions of a numpy array
                            
                                MySQL PHP - SELECT WHERE id = array()? [duplicate]
                            
                                NumPy first and last element from array
                            
                                Get last key-value pair in PHP array
                            
                                How to append a string to each element of a Bash array?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Algorithm to determine if array contains n...n+m?

Tags:

arrays

algorithm

big-o

puzzle

People also ask

Note to everyone (stuff collected from comments)

New insight (well, more of musings that don't help solve it but are interesting and I'm going to bed):

Simpler approach (modifying hazzen's idea):

Recent Activity

Donate For Us