How can I find a number which occurs an odd number of times in a SORTED array in O(n) time?

Question

I have a question and I tried to think over it again and again... but got nothing so posting the question here. Maybe I could get some view-point of others, to try and make it work...

The question is: we are given a SORTED array, which consists of a collection of values occurring an EVEN number of times, except one, which occurs ODD number of times. We need to find the solution in log n time.

It is easy to find the solution in O(n) time, but it looks pretty tricky to perform in log n time.

user382751 · Accepted Answer

Theorem: Every deterministic algorithm for this problem probes Ω(log² n) memory locations in the worst case.

Proof (completely rewritten in a more formal style):

Let k > 0 be an odd integer and let n = k². We describe an adversary that forces (log₂ (k + 1))² = Ω(log² n) probes.

We call the maximal subsequences of identical elements groups. The adversary's possible inputs consist of k length-k segments x₁ x₂ … x_k. For each segment x_j, there exists an integer b_j ∈ [0, k] such that x_j consists of b_j copies of j - 1 followed by k - b_j copies of j. Each group overlaps at most two segments, and each segment overlaps at most two groups.

Group boundaries |   |     |   |   |  0 0 1 1 1 2 2 3 3 |     |     |     | Segment boundaries

Wherever there is an increase of two, we assume a double boundary by convention.

Group boundaries |     ||       |   |  0 0 0  2 2 2 2 3 3

Claim: The location of the j^th group boundary (1 ≤ j ≤ k) is uniquely determined by the segment x_j.

Proof: It's just after the ((j - 1) k + b_j)^th memory location, and x_j uniquely determines b_j. //

We say that the algorithm has observed the j^th group boundary in case the results of its probes of x_j uniquely determine x_j. By convention, the beginning and the end of the input are always observed. It is possible for the algorithm to uniquely determine the location of a group boundary without observing it.

Group boundaries |   X   |   |     |  0 0 ? 1 2 2 3 3 3 |     |     |     | Segment boundaries

Given only 0 0 ?, the algorithm cannot tell for sure whether ? is a 0 or a 1. In context, however, ? must be a 1, as otherwise there would be three odd groups, and the group boundary at X can be inferred. These inferences could be problematic for the adversary, but it turns out that they can be made only after the group boundary in question is "irrelevant".

Claim: At any given point during the algorithm's execution, consider the set of group boundaries that it has observed. Exactly one consecutive pair is at odd distance, and the odd group lies between them.

Proof: Every other consecutive pair bounds only even groups. //

Define the odd-length subsequence bounded by the special consecutive pair to be the relevant subsequence.

Claim: No group boundary in the interior of the relevant subsequence is uniquely determined. If there is at least one such boundary, then the identity of the odd group is not uniquely determined.

Proof: Without loss of generality, assume that each memory location not in the relevant subsequence has been probed and that each segment contained in the relevant subsequence has exactly one location that has not been probed. Suppose that the j^th group boundary (call it B) lies in the interior of the relevant subsequence. By hypothesis, the probes to x_j determine B's location up to two consecutive possibilities. We call the one at odd distance from the left observed boundary odd-left and the other odd-right. For both possibilities, we work left to right and fix the location of every remaining interior group boundary so that the group to its left is even. (We can do this because they each have two consecutive possibilities as well.) If B is at odd-left, then the group to its left is the unique odd group. If B is at odd-right, then the last group in the relevant subsequence is the unique odd group. Both are valid inputs, so the algorithm has uniquely determined neither the location of B nor the odd group. //

Example:

Observed group boundaries; relevant subsequence marked by […] [             ]   |  0 0 Y 1 1 Z 2 3 3 |     |     |     | Segment boundaries  Possibility #1: Y=0, Z=2 Possibility #2: Y=1, Z=2 Possibility #3: Y=1, Z=1

As a consequence of this claim, the algorithm, regardless of how it works, must narrow the relevant subsequence to one group. By definition, it therefore must observe some group boundaries. The adversary now has the simple task of keeping open as many possibilities as it can.

At any given point during the algorithm's execution, the adversary is internally committed to one possibility for each memory location outside of the relevant subsequence. At the beginning, the relevant subsequence is the entire input, so there are no initial commitments. Whenever the algorithm probes an uncommitted location of x_j, the adversary must commit to one of two values: j - 1, or j. If it can avoid letting the j^th boundary be observed, it chooses a value that leaves at least half of the remaining possibilities (with respect to observation). Otherwise, it chooses so as to keep at least half of the groups in the relevant interval and commits values for the others.

In this way, the adversary forces the algorithm to observe at least log₂ (k + 1) group boundaries, and in observing the j^th group boundary, the algorithm is forced to make at least log₂ (k + 1) probes.

Extensions:

This result extends straightforwardly to randomized algorithms by randomizing the input, replacing "at best halved" (from the algorithm's point of view) with "at best halved in expectation", and applying standard concentration inequalities.

It also extends to the case where no group can be larger than s copies; in this case the lower bound is Ω(log n log s).

Nabb · Answer

A sorted array suggests a binary search. We have to redefine equality and comparison. Equality simple means an odd number of elements. We can do comparison by observing the index of the first or last element of the group. The first element will be an even index (0-based) before the odd group, and an odd index after the odd group. We can find the first and last elements of a group using binary search. The total cost is O((log N)²).

PROOF OF O((log N)²)

  T(2) = 1 //to make the summation nice   T(N) = log(N) + T(N/2) //log(N) is finding the first/last elements

For some N=2^k,

T(2^k) = (log 2^k) + T(2^(k-1))        = (log 2^k) + (log 2^(k-1)) + T(2^(k-2))        = (log 2^k) + (log 2^(k-1)) + (log 2^(k-2)) + ... + (log 2^2) + 1        = k + (k-1) + (k-2) + ... + 1        = k(k+1)/2        = (k² + k)/2        = (log(N)² + log(N))/ 2        = O(log(N)²)

How can I find a number which occurs an odd number of times in a SORTED array in O(n) time?

Tags:

c

algorithm

asymptotic-complexity

AGeek

2 Answers

user382751

Nabb

Recent Activity

Donate For Us

How can I find a number which occurs an odd number of times in a SORTED array in O(n) time?

Tags:

c

algorithm

asymptotic-complexity

AGeek

2 Answers

user382751

Nabb

Related questions

Recent Activity

Donate For Us