Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find an element in an array, but the element can jump

There is an array where all but one of the cells are 0, and we want to find the index of that single non-zero cell. The problem is, every time that you check for a cell in this array, that non-zero element will do one of the following:

  1. move forward by 1
  2. move backward by 1
  3. stay where it is.

For example, if that element is currently at position 10, and I check what is in arr[5], then the element may be at position 9, 10 or 11 after I checked arr[5].

We only need to find the position where the element is currently at, not where it started at (which is impossible).

The hard part is, if we write a for loop, there really is no way to know if the element is currently in front of you, or behind you.

Some more context if it helps:

  1. The interviewer did give a hint which is maybe I should move my pointer back after checking x-number of cells. The problem is, when should I move back, and by how many slots?
  2. While "thinking out loud", I started saying a bunch of common approaches hoping that something would hit. When I said recursion, the interviewer did say "recursion is a good start". I don't know recursion really is the right approach, because I don't see how I can do recursion and #1 at the same time.
  3. The interviewer said this problem can't be solved in O(n^2). So we are looking at at least O(n^3), or maybe even exponential.
like image 335
Kent Avatar asked Mar 22 '16 08:03

Kent


People also ask

Is jump search better than binary Search?

Because both steps of the algorithm look at, at most, √n items the algorithm runs in O(√n) time. This is better than a linear search, but worse than a binary search. The advantage over the latter is that a jump search only needs to jump backwards once, while a binary can jump backwards up to log n times.

How to jump search?

The basic idea is to check fewer elements (than linear search) by jumping ahead by fixed steps or skipping some elements in place of searching all elements. For example, suppose we have an array arr[] of size n and a block (to be jumped) of size m. Then we search in the indexes arr[0], arr[m], arr[2m]…..

Can you reach the end of an array?

Explanation: initial position: arr[S] = arr[2] = 2. Since both are out of bounds, Hence end can't be reached.

How can we retrieve an element from an array?

Retrieving the first and last elements in a simple array can most easily be done by using the ARRAY_FIRST and ARRAY_LAST functions. Retrieving the next or previous elements in a simple array can most easily be done by using the ARRAY_PRIOR and ARRAY_NEXT functions.


Video Answer


1 Answers

Tl;dr: Your best bet is to keep checking each even index in the array in turn, wrapping around as many times as necessary until you find your target. On average you will stumble upon your target in the middle of your second pass.


First off, as many have already said, it is indeed impossible to ensure you will find your target element in any given amount of time. If the element knows where your next sample will be, it can always place itself somewhere else just in time. The best you can do is to sample the array in a way that minimizes the expected number of accesses - and because after each sample you learn nothing except if you were successful or not and a success means you stop sampling, an optimal strategy can be described simply as a sequence of indexes that should be checked, dependent only on the size of the array you're looking through. We can test each strategy in turn via automated means to see how well they perform. The results will depend on the specifics of the problem, so let's make some assumptions:

  • The question doesn't specify the starting position our target. Let us assume that the starting position is chosen uniformly from across the entire array.
  • The question doesn't specify the probability our target moves. For simplicity let's say it's independent on parameters such as the current position in the array, time passed and the history of samples. Using the probability 1/3 for each option gives us the least information, so let's use that.
  • Let us test our algorithms on an array of 100 101 elements. Also, let us test each algorithm one million times, just to be reasonably sure about its average case behavior.

The algorithms I've tested are:

  • Random sampling: after each attempt we forget where we were looking and choose an entirely new index at random. Each sample has an independent 1/n chance of succeeding, so we expect to take n samples on average. This is our control.
  • Sweep: try each position in sequence until our target is found. If our target wasn't moving, this would take n/2 samples on average. Our target is moving, however, so we may miss it on our first sweep.
  • Slow sweep: the same, except we test each position several times before moving on. Proposed by Patrick Trentin with a slowdown factor of 30x, tested with a slowdown factor of 2x.
  • Fast sweep: the opposite of slow sweep. After the first sample we skip (k-1) cells before testing the next one. The first pass starts at ary[0], the next at ary[1] and so on. Tested with each speed up factor (k) from 2 to 5.
  • Left-right sweep: First we check each index in turn from left to right, then each index from right to left. This algorithm would be guaranteed to find our target if it was always moving (which it isn't).
  • Smart greedy: Proposed by Aziuth. The idea behind this algorithm is that we track each cell probability of holding our target, then always sampling the cell with the highest probability. On one hand, this algorithm is relatively complex, on the other hand it sounds like it should give us the optimal results.

Results:

The results are shown as [average] ± [standard derivation].

  • Random sampling: 100.889145 ± 100.318212

    At this point I have realised a fencepost error in my code. Good thing we have a control sample. This also establishes that we have in the ballpark of two or three digits of useful precision (sqrt #samples), which is in line with other tests of this type.

  • Sweep: 100.327030 ± 91.210692

    The chance of our target squeezing through the net well counteracts the effect of the target taking n/2 time on average to reach the net. The algorithm doesn't really fare any better than a random sample on average, but it's more consistent in its performance and it isn't hard to implement either.

  • slow sweep (x0.5): 128.272588 ± 99.003681

    While the slow movement of our net means our target will probably get caught in the net during the first sweep and won't need a second sweep, it also means the first sweep takes twice as long. All in all, relying on the target moving onto us seems a little inefficient.

  • fast sweep x2: 75.981733 ± 72.620600

  • fast sweep x3: 84.576265 ± 83.117648
  • fast sweep x4: 88.811068 ± 87.676049
  • fast sweep x5: 91.264716 ± 90.337139

    That's... a little surprising at first. While skipping every other step means we complete each lap in twice as many turns, each lap also has a reduced chance of actually encountering the target. A nicer view is to compare Sweep and FastSweep in broom-space: rotate each sample so that the index being sampled is always at 0 and the target drifts towards the left a bit faster. In Sweep, the target moves at 0, 1 or 2 speed each step. A quick parallel with the Fibonacci base tells us that the target should hit the broom/net around 62% of the time. If it misses, it takes another 100 turns to come back. In FastSweep, the target moves at 1, 2 or 3 speed each step meaning it misses more often, but it also takes half as much time to retry. Since the retry time drops more than the hit rate, it is advantageous to use FastSweep over Sweep.

  • Left-right sweep: 100.572156 ± 91.503060

    Mostly acts like an ordinary sweep, and its score and standard derivation reflect that. Not too surprising a result.

  • Aziuth's smart greedy: 87.982552 ± 85.649941

    At this point I have to admit a fault in my code: this algorithm is heavily dependent on its initial behavior (which is unspecified by Aziuth and was chosen to be randomised in my tests). But performance concerns meant that this algorithm will always choose the same randomized order each time. The results are then characteristic of that randomisation rather than of the algorithm as a whole.

    Always picking the most likely spot should find our target as fast as possible, right? Unfortunately, this complex algorithm barely competes with Sweep 3x. Why? I realise this is just speculation, but let us peek at the sequence Smart Greedy actually generates: During the first pass, each cell has equal probability of containing the target, so the algorithm has to choose. If it chooses randomly, it could pick up in the ballpark of 20% of cells before the dips in probability reach all of them. Afterwards the landscape is mostly smooth where the array hasn't been sampled recently, so the algorithm eventually stops sweeping and starts jumping around randomly. The real problem is that the algorithm is too greedy and doesn't really care about herding the target so it could pick at the target more easily.

    Nevertheless, this complex algorithm does fare better than both simple Sweep and a random sampler. it still can't, however, compete with the simplicity and surprising efficiency of FastSweep. Repeated tests have shown that the initial randomisation could swing the efficiency anywhere between 80% run time (20% speedup) and 90% run time (10% speedup).

Finally, here's the code that was used to generate the results:

class WalkSim
  attr_reader :limit, :current, :time, :p_stay
  def initialize limit, p_stay
    @p_stay = p_stay
    @limit = limit
    @current = rand (limit + 1)
    @time = 0
  end

  def poke n
    r = n == @current
    @current += (rand(2) == 1 ? 1 : -1) if rand > @p_stay
    @current = [0, @current, @limit].sort[1]
    @time += 1
    r
  end

  def WalkSim.bench limit, p_stay, runs
    histogram = Hash.new{0}
    runs.times do
      sim = WalkSim.new limit, p_stay
      gen = yield
      nil until sim.poke gen.next
      histogram[sim.time] += 1
    end
    histogram.to_a.sort
  end
end

class Array; def sum; reduce 0, :+; end; end

def stats histogram
  count = histogram.map{|k,v|v}.sum.to_f
  avg = histogram.map{|k,v|k*v}.sum / count
  variance = histogram.map{|k,v|(k-avg)**2*v}.sum / (count - 1)
  {avg: avg, stddev: variance ** 0.5}
end

RUNS = 1_000_000
PSTAY = 1.0/3
LIMIT = 100

puts "random sampling"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) { 
  Enumerator.new {|y|loop{y.yield rand (LIMIT + 1)}}
}

puts "sweep"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) { 
  Enumerator.new {|y|loop{0.upto(LIMIT){|i|y.yield i}}}
}

puts "x0.5 speed sweep"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) { 
  Enumerator.new {|y|loop{0.upto(LIMIT){|i|2.times{y.yield i}}}}
}
(2..5).each do |speed|
  puts "x#{speed} speed sweep"
  p stats WalkSim.bench(LIMIT, PSTAY, RUNS) { 
    Enumerator.new {|y|loop{speed.times{|off|off.step(LIMIT, speed){|i|y.yield i}}}}
  }
end
puts "sweep LR"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) { 
  Enumerator.new {|y|loop{
    0.upto(LIMIT){|i|y.yield i}
    LIMIT.downto(0){|i|y.yield i}
  }}
}

$sg_gen = Enumerator.new do |y|
  probs = Array.new(LIMIT + 1){1.0 / (LIMIT + 1)}
  loop do
    ix = probs.each_with_index.map{|v,i|[v,rand,i]}.max.last
    probs[ix] = 0
    probs = [probs[0] * (1 + PSTAY)/2 + probs[1] * (1 - PSTAY)/2,
             *probs.each_cons(3).map{|a, b, c| (a + c) / 2 * (1 - PSTAY) + b * PSTAY},
             probs[-1] * (1 + PSTAY)/2 + probs[-2] * (1 - PSTAY)/2]

    y.yield ix
  end
end

$sg_cache = []
def sg_enum; Enumerator.new{|y| $sg_cache.each{|n| y.yield n}; $sg_gen.each{|n| $sg_cache.push n; y.yield n}}; end

puts "smart greedy"
p stats WalkSim.bench(LIMIT, PSTAY, RUNS) {sg_enum}
like image 117
John Dvorak Avatar answered Oct 06 '22 02:10

John Dvorak