Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Searching an array for integers in a specific range

Can someone give me ideas for the following problem?

Given an array ar[] of length n, and some queries, each query is of the form a, b, c, find the number with the smallest index i such that the index lies in the range [c, n] and such that a < ar[i] < b. There are (n - 1) in total, c goes from 1 to n - 1. The expected complexity for each query should be about O(log n), and a precomputation of complexity at most O(n log n) is suitable. Intuitively, segment tree came up to my mind, but I couldn't think of a way of building it, nor what to keep in each node.

like image 739
Chris Avatar asked Dec 14 '11 12:12

Chris


People also ask

How do you search an array for a specific element?

Use filter if you want to find all items in an array that meet a specific condition. Use find if you want to check if that at least one item meets a specific condition. Use includes if you want to check if an array contains a particular value. Use indexOf if you want to find the index of a particular item in an array.

What is the most efficient way to search an array of numbers?

Sequential search is the best that we can do when trying to find a value in an unsorted array. 1 But if the array is sorted in increasing order by value, then we can do much better. We use a process called binary search.


2 Answers

Ok, I thought I should implement the O(nlogn)/O(logn) solution I discussed with user12861.

The code works by building n trees, one for each c. Each tree shares most of its content with the smaller ones, with a maximum of logn new nodes. This way the total memory and time cost for preprocessing is limited to O(nlogn).

The actual trees are quite similar to interval trees. The nodes store the minimum and maximum value they span over, and the smallest index they contain. My implementation is however not self-balancing, but this can be added using your favorite heuristics.

import sys

class Node:
    pass
class Interval(Node):
    def __init__(self,left,val,i,right):
        self.i = i; self.val = val
        self.left = left; self.right = right
        self.mini = min(left.mini, i, right.mini)
        self.min = min(left.min, val, right.min)
        self.max = max(left.max, val, right.max)
    def add(self,val,i):
        # We do naive inserting here. In a real worst case logn solution,
        # self-balancing would take place here.
        if (val,i) < (self.val,self.i): return Interval(self.left.add(val,i), self.val, self.i, self.right)
        if (self.val,self.i) < (val,i): return Interval(self.left, self.val, self.i, self.right.add(val,i))
        return self
    def query(self,a,b):
        # If the entire interval is covered, return mini, this is what keeps us
        # at max 2 opened nodes per level in the tree.
        if a <= self.min and self.max <= b: return self.mini
        # Otherwise compose an answer from the subtrees.
        res = sys.maxint
        if a <= self.val <= b: res = min(res, self.i)
        if self.left.min <= b and self.left.max >= a: res = min(res, self.left.query(a,b))
        if self.right.min <= b and self.right.max >= a: res = min(res, self.right.query(a,b))
        return res
class Tip(Node):
    def __init__(self):
        # This is a typical null-pattern idiom. The self.values are set to
        # the min/max identities to ensure to interference with the upwards
        # value propagation.
        self.mini = sys.maxint
        self.min = sys.maxint
        self.max = -sys.maxint
    def add(self,val,i):
        return Interval(Tip(), val, i, Tip())
    def query(self,a,b):
        return sys.maxint

# The input data array
data = [0, 3, 1, 2, 0, 4, 9, 6, 1, 7]
N = len(data)

# Build the array of trees. O(nlogn)
ar = [None]*N + [Tip()]
for i in range(N-1, -1, -1):
    ar[i] = ar[i+1].add(data[i],i)

# The query function. O(logn)
def query(a,b,c):
    return ar[c].query(a,b)

# Test
print query(2, 7, 4) # = 5
like image 174
Thomas Ahle Avatar answered Oct 17 '22 07:10

Thomas Ahle


I've modified Mason Bryant's technique to something that works. The problems were more bugs in FindLowestIndex and a more major bug searching the tree (it can return multiple results).

Despite doing that work, it still doesn't actually solve the problem. O(n log n) time setting up is easy enough, but using this technique I'm only able to get O((log n)^2) query time. I wonder if you have a link to the original problem in case there are more clarifications there? Or I wonder if the problem is really solvable. Or maybe O((log n)^2) is "about" O(log n) as the problem requests, it's less than O(n) anyway.

The technique is to store our array in a typical segment tree, but along with the usual segment information we also store, in order, all the indexes of the elements under each node. This extra storage only takes an extra O(n log n) time/space if you add it up properly (n items stored at each of log n levels), so it doesn't hurt our setup time in any way that matters. Then we query the tree to find the minimum set of nodes that are contained by our range of (a, b). This query takes about the same time as a typical segment tree query (O(log n)) and finds at most about 2*log n matching segments. As we query, we find the lowest index in each matching segment that matches our constraint c. We can use a binary search to find this index since we kept the indices in order, so it takes O(log n) time worst case for each matching node. When we add it all up appropriately, the total time is O((log n)^2).

Let me know whichever steps you'd like clarified.

C# Code:

void Test() {
  DateTime start;
  TimeSpan at = new TimeSpan(), bt = new TimeSpan();

  Random rg = new Random();
  for(int i = 0; i < 20; i++) {
    // build arrays from 5 to 10000 elements long, random values between 0 and 100
    // Break even time for queries is around 10000 elements in array for the values and queries used
    int[] array = (from n in Enumerable.Range(1, rg.Next(5, 10000)) select rg.Next(0, 100)).ToArray<int>();

    // Setup should be O(n log n) time/space
    ArraySearcher Searcher = new ArraySearcher(array);

    // Test each array a number of times equal to its length, with random values for a, b, c
    for(int j = 0; j < array.Length; j++) {
      int a = rg.Next(-1, 101), b = rg.Next(a, 102), c = rg.Next(0, array.Length);
      start = DateTime.Now;
      int expected = BruteSearch(array, a, b, c);
      at += DateTime.Now - start;
      // Search should be O(log n) time, but in reality is O((log n)^2) :(
      start = DateTime.Now;
      int got = Searcher.QuickSearch(a, b, c);
      bt += DateTime.Now - start;
      System.Diagnostics.Debug.Assert(got == expected);
    }
  }
  MessageBox.Show(at.ToString() + ", " + bt.ToString());
}

int BruteSearch(int[] array, int a, int b, int c) {
  for(int i = c; i < array.Length; i++)
    if(a < array[i] && array[i] < b)
      return i;
  return -1;
}

class ArraySearcher {

  SegmentNode Root;
  List<SegmentNode> Nodes;

  public ArraySearcher(int[] array) {
    Nodes = array.Select((value, index) => new SegmentNode(value, value, new List<int> { index }, null, null)).ToList<SegmentNode>();
    // Sorting will take us O(n log n)
    Nodes.Sort();
    // Creating a normal segment tree takes O(n log n)
    // In addition, in this tree each node stores all the indices below it in order
    // There are a total of n of these indices at each tree level, kept in order as we go at no extra time cost
    // There are log n levels to the tree
    // So O(n log n) extra time and space is spent making our lists, which doesn't hurt our big O notation compared to just sorting
    this.Root = SegmentNode.Merge(Nodes, 0, Nodes.Count - 1);
  }

  public int QuickSearch(int a, int b, int c) {
    return this.Root.FindLowestIndex(a, b, c);
  }

  class SegmentNode : IComparable {

    public int Min, Max;
    public List<int> Indices;
    public SegmentNode Left, Right;

    public SegmentNode(int Min, int Max, List<int> Indices, SegmentNode Left, SegmentNode Right) {
      this.Min = Min;
      this.Max = Max;
      this.Indices = Indices;
      this.Left = Left;
      this.Right = Right;
    }

    int IComparable.CompareTo(object other) {
      return this.Min - ((SegmentNode)other).Min;
    }

    public static SegmentNode Merge(List<SegmentNode> Nodes, int Start, int End) {
      int Count = End - Start;
      if(Start == End)
        return Nodes[Start];
      if(End - Start == 1)
        return SegmentNode.Merge(Nodes[Start], Nodes[End]);
      return SegmentNode.Merge(SegmentNode.Merge(Nodes, Start, Start + Count/2), SegmentNode.Merge(Nodes, Start + Count/2 + 1, End));
    }

    public static SegmentNode Merge(SegmentNode Left, SegmentNode Right) {
      int LeftCounter = 0, RightCounter = 0;
      List<int> NewIndices = new List<int>();
      while(LeftCounter < Left.Indices.Count || RightCounter < Right.Indices.Count) {
        if(LeftCounter < Left.Indices.Count && (RightCounter == Right.Indices.Count || Left.Indices[LeftCounter] < Right.Indices[RightCounter]))
          NewIndices.Add(Left.Indices[LeftCounter++]);
        else
          NewIndices.Add(Right.Indices[RightCounter++]);
      }
      return new SegmentNode(Left.Min, Right.Max, NewIndices, Left, Right);
    }

    public int FindLowestIndex(int SeekMin, int SeekMax, int c) {
      // This will find at most O(log n) matching segments, always less than 2 from each level of the tree
      // Each matching segment is binary searched in at worst O(log n) time
      // Total does indeed add up to O((log n)^2) if you do it right
      if(SeekMin < this.Min && SeekMax > this.Max)
        return FindLowestIndex(this.Indices, c);
      if(SeekMax <= this.Min || SeekMin >= this.Max)
        return -1;
      int LeftMatch = this.Left.FindLowestIndex(SeekMin, SeekMax, c);
      int RightMatch = this.Right.FindLowestIndex(SeekMin, SeekMax, c);
      if(LeftMatch == -1)
        return RightMatch;
      if(RightMatch == -1)
        return LeftMatch;
      return LeftMatch < RightMatch ? LeftMatch : RightMatch;
    }

    int FindLowestIndex(List<int> Indices, int c) {
      int left = 0, right = Indices.Count - 1, mid = left + (right - left) / 2;
      while(left <= right) {
        if(Indices[mid] == c)
          return c;
        if(Indices[mid] > c)
          right = mid - 1;
        else
          left = mid + 1;
        mid = left + (right - left) / 2;
      }
      if(mid >= Indices.Count)
        return -1;
      // no exact match, so return the next highest.
      return Indices[mid];
    }
  }
}
like image 42
user12861 Avatar answered Oct 17 '22 08:10

user12861