Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I find a number which occurs an odd number of times in a SORTED array in O(n) time?

I have a question and I tried to think over it again and again... but got nothing so posting the question here. Maybe I could get some view-point of others, to try and make it work...

The question is: we are given a SORTED array, which consists of a collection of values occurring an EVEN number of times, except one, which occurs ODD number of times. We need to find the solution in log n time.

It is easy to find the solution in O(n) time, but it looks pretty tricky to perform in log n time.

like image 934
AGeek Avatar asked Jul 05 '10 16:07

AGeek


2 Answers

Theorem: Every deterministic algorithm for this problem probes Ω(log2 n) memory locations in the worst case.

Proof (completely rewritten in a more formal style):

Let k > 0 be an odd integer and let n = k2. We describe an adversary that forces (log2 (k + 1))2 = Ω(log2 n) probes.

We call the maximal subsequences of identical elements groups. The adversary's possible inputs consist of k length-k segments x1 x2 … xk. For each segment xj, there exists an integer bj ∈ [0, k] such that xj consists of bj copies of j - 1 followed by k - bj copies of j. Each group overlaps at most two segments, and each segment overlaps at most two groups.

Group boundaries |   |     |   |   |  0 0 1 1 1 2 2 3 3 |     |     |     | Segment boundaries 

Wherever there is an increase of two, we assume a double boundary by convention.

Group boundaries |     ||       |   |  0 0 0  2 2 2 2 3 3 

Claim: The location of the jth group boundary (1 ≤ j ≤ k) is uniquely determined by the segment xj.

Proof: It's just after the ((j - 1) k + bj)th memory location, and xj uniquely determines bj. //

We say that the algorithm has observed the jth group boundary in case the results of its probes of xj uniquely determine xj. By convention, the beginning and the end of the input are always observed. It is possible for the algorithm to uniquely determine the location of a group boundary without observing it.

Group boundaries |   X   |   |     |  0 0 ? 1 2 2 3 3 3 |     |     |     | Segment boundaries 

Given only 0 0 ?, the algorithm cannot tell for sure whether ? is a 0 or a 1. In context, however, ? must be a 1, as otherwise there would be three odd groups, and the group boundary at X can be inferred. These inferences could be problematic for the adversary, but it turns out that they can be made only after the group boundary in question is "irrelevant".

Claim: At any given point during the algorithm's execution, consider the set of group boundaries that it has observed. Exactly one consecutive pair is at odd distance, and the odd group lies between them.

Proof: Every other consecutive pair bounds only even groups. //

Define the odd-length subsequence bounded by the special consecutive pair to be the relevant subsequence.

Claim: No group boundary in the interior of the relevant subsequence is uniquely determined. If there is at least one such boundary, then the identity of the odd group is not uniquely determined.

Proof: Without loss of generality, assume that each memory location not in the relevant subsequence has been probed and that each segment contained in the relevant subsequence has exactly one location that has not been probed. Suppose that the jth group boundary (call it B) lies in the interior of the relevant subsequence. By hypothesis, the probes to xj determine B's location up to two consecutive possibilities. We call the one at odd distance from the left observed boundary odd-left and the other odd-right. For both possibilities, we work left to right and fix the location of every remaining interior group boundary so that the group to its left is even. (We can do this because they each have two consecutive possibilities as well.) If B is at odd-left, then the group to its left is the unique odd group. If B is at odd-right, then the last group in the relevant subsequence is the unique odd group. Both are valid inputs, so the algorithm has uniquely determined neither the location of B nor the odd group. //

Example:

Observed group boundaries; relevant subsequence marked by […] [             ]   |  0 0 Y 1 1 Z 2 3 3 |     |     |     | Segment boundaries  Possibility #1: Y=0, Z=2 Possibility #2: Y=1, Z=2 Possibility #3: Y=1, Z=1 

As a consequence of this claim, the algorithm, regardless of how it works, must narrow the relevant subsequence to one group. By definition, it therefore must observe some group boundaries. The adversary now has the simple task of keeping open as many possibilities as it can.

At any given point during the algorithm's execution, the adversary is internally committed to one possibility for each memory location outside of the relevant subsequence. At the beginning, the relevant subsequence is the entire input, so there are no initial commitments. Whenever the algorithm probes an uncommitted location of xj, the adversary must commit to one of two values: j - 1, or j. If it can avoid letting the jth boundary be observed, it chooses a value that leaves at least half of the remaining possibilities (with respect to observation). Otherwise, it chooses so as to keep at least half of the groups in the relevant interval and commits values for the others.

In this way, the adversary forces the algorithm to observe at least log2 (k + 1) group boundaries, and in observing the jth group boundary, the algorithm is forced to make at least log2 (k + 1) probes.


Extensions:

This result extends straightforwardly to randomized algorithms by randomizing the input, replacing "at best halved" (from the algorithm's point of view) with "at best halved in expectation", and applying standard concentration inequalities.

It also extends to the case where no group can be larger than s copies; in this case the lower bound is Ω(log n log s).

like image 91
user382751 Avatar answered Oct 11 '22 06:10

user382751


A sorted array suggests a binary search. We have to redefine equality and comparison. Equality simple means an odd number of elements. We can do comparison by observing the index of the first or last element of the group. The first element will be an even index (0-based) before the odd group, and an odd index after the odd group. We can find the first and last elements of a group using binary search. The total cost is O((log N)²).

PROOF OF O((log N)²)

  T(2) = 1 //to make the summation nice   T(N) = log(N) + T(N/2) //log(N) is finding the first/last elements 

For some N=2^k,

T(2^k) = (log 2^k) + T(2^(k-1))        = (log 2^k) + (log 2^(k-1)) + T(2^(k-2))        = (log 2^k) + (log 2^(k-1)) + (log 2^(k-2)) + ... + (log 2^2) + 1        = k + (k-1) + (k-2) + ... + 1        = k(k+1)/2        = (k² + k)/2        = (log(N)² + log(N))/ 2        = O(log(N)²) 
like image 38
Nabb Avatar answered Oct 11 '22 08:10

Nabb