Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient data structure for sparse data lookup

Tags:

c++

algorithm

Situation:

Given some points with coordinate (x, y) Range 0 < x < 100,000,000 and 0 < y <100,000,000

I have to find smallest square which contains at least N no of points on its edge and inside it.

  1. I used vector to store coordinates and searched all squares with side length minLength upto side length maxLength (Appling Brute Force in relevant space)

    struct Point
    {
            int x;
            int y;
    };
    
    vector<Point> P;
    int minLength = sqrt(N) - 1;
    int maxLength = 0;
    
    //   bigx= largest x coordinate of any point
    //   bigy= largest y coordinate of any point
    //   smallx= smallest x coordinate of any point
    //   smally= smallest y coordinate of any point
    
    (bigx - smallx) < (bigy - smally) ? maxLength = (bigx - smallx) : maxLength = (bigy - smally);
    
  2. For each square I looked up, traversed through complete vector to see if at least N points are on its edge and inside it.

This was quite time inefficient.

Q1. What data structure should I use to improve time efficiency without changing Algorithm I used?
Q2. Efficient Algorithm for this problem?

like image 799
NightFurry Avatar asked Jun 25 '14 07:06

NightFurry


People also ask

Which data structure is best for lookup?

Looking at complexity analysis, Hashtables seem to be the most efficient for lookups.

In which data structure lookup is fast?

With a hash table, you can access objects by the key, so this structure is high-speed for lookups. Hash tables are faster than the arrays for lookups.

What is sparse data structure?

Sparse data structures allow us to store only non-zero values assuming the rest of them are zeros. This approach saves a lot of memory and computing time. In fact, you can often encounter such matrices when working with NLP or machine learning tasks. In Python, sparse data structures are implemented in scipy.

Which data structure should be used to structure network traffic in an organization?

Bandwidth tree - A data structure for routing in networks with advanced reservations.


Video Answer


1 Answers

There are points on 2 opposite edges - if not, you could shrink the square by 1 and still contain the same number of points. That means the possible coordinates of the edges are limited to those of the input points. The input points are probably not on the corners, though. (For a minimum rectangle, there would be points on all 4 edges as you can shrink one dimension without altering the other)

The next thing to realize is that each point divides the plane in 4 quadrants, and each quadrant contains a number of points. (These can add up to more than the total number of points as the quadrants have one pixel overlap). Lets say that NW(p) is the number of points to the northwest of point p, i.e. those that have x>=px and y>=py. Then the number of points in a square is NW(bottomleft) + NW(topright) - NW(bottomright) - NW(topleft).

It's fairly easy to calculate NW(p) for all input points. Sort them by x and for equal x by y. The most northwestern point has NW(p)==0. The next point can have NW(p)==1 if it's to the southeast of the first point, else it has NW(p)==0. It's also useful to keep track of SW(p) in this stage, as you're working through the points from west to east and they're therefore not sorted north to south. Having calculated NW(p), you can determine the number of points in a square S in O(1)

Recall that the square size is restricted by by the need to have points on opposite edges. Assume the points are on the left (western) and right edge - you still have the points sorted by x order. Start by assuming the left edge is at your leftmost x coordinate, and see what the right edge must be to contain N points. Now shift the left edge to the next x coordinate and find a new right edge (and thus a new square). Do this until the right edge of the square is the rightmost point.

Its also possible that the square is constrained in y direction. Just sort the points in y direction and repeat, then choose the smallest square between the two outcomes.

Since you're running linearly through the points in x and y direction, that part is just O(N) and the dominant factor is the O(N log N) sort.

like image 173
MSalters Avatar answered Sep 30 '22 14:09

MSalters