Efficient data structure for sparse data lookup

Q: Which data structure is best for lookup?

Looking at complexity analysis, Hashtables seem to be the most efficient for lookups.

Q: In which data structure lookup is fast?

With a hash table, you can access objects by the key, so this structure is high-speed for lookups. Hash tables are faster than the arrays for lookups.

Q: What is sparse data structure?

Sparse data structures allow us to store only non-zero values assuming the rest of them are zeros. This approach saves a lot of memory and computing time. In fact, you can often encounter such matrices when working with NLP or machine learning tasks. In Python, sparse data structures are implemented in scipy.

Q: Which data structure should be used to structure network traffic in an organization?

Bandwidth tree - A data structure for routing in networks with advanced reservations.

Tags:

c++

algorithm

Situation:

Given some points with coordinate (x, y) Range 0 < x < 100,000,000 and 0 < y <100,000,000

I have to find smallest square which contains at least N no of points on its edge and inside it.

I used vector to store coordinates and searched all squares with side length minLength upto side length maxLength (Appling Brute Force in relevant space)

struct Point
{
        int x;
        int y;
};

vector<Point> P;
int minLength = sqrt(N) - 1;
int maxLength = 0;

//   bigx= largest x coordinate of any point
//   bigy= largest y coordinate of any point
//   smallx= smallest x coordinate of any point
//   smally= smallest y coordinate of any point

(bigx - smallx) < (bigy - smally) ? maxLength = (bigx - smallx) : maxLength = (bigy - smally);

For each square I looked up, traversed through complete vector to see if at least N points are on its edge and inside it.

This was quite time inefficient.

Q1. What data structure should I use to improve time efficiency without changing Algorithm I used?
Q2. Efficient Algorithm for this problem?

799

asked Jun 25 '14 07:06

NightFurry

Video Answer

1 Answers

There are points on 2 opposite edges - if not, you could shrink the square by 1 and still contain the same number of points. That means the possible coordinates of the edges are limited to those of the input points. The input points are probably not on the corners, though. (For a minimum rectangle, there would be points on all 4 edges as you can shrink one dimension without altering the other)

The next thing to realize is that each point divides the plane in 4 quadrants, and each quadrant contains a number of points. (These can add up to more than the total number of points as the quadrants have one pixel overlap). Lets say that NW(p) is the number of points to the northwest of point p, i.e. those that have x>=px and y>=py. Then the number of points in a square is NW(bottomleft) + NW(topright) - NW(bottomright) - NW(topleft).

It's fairly easy to calculate NW(p) for all input points. Sort them by x and for equal x by y. The most northwestern point has NW(p)==0. The next point can have NW(p)==1 if it's to the southeast of the first point, else it has NW(p)==0. It's also useful to keep track of SW(p) in this stage, as you're working through the points from west to east and they're therefore not sorted north to south. Having calculated NW(p), you can determine the number of points in a square S in O(1)

Recall that the square size is restricted by by the need to have points on opposite edges. Assume the points are on the left (western) and right edge - you still have the points sorted by x order. Start by assuming the left edge is at your leftmost x coordinate, and see what the right edge must be to contain N points. Now shift the left edge to the next x coordinate and find a new right edge (and thus a new square). Do this until the right edge of the square is the rightmost point.

Its also possible that the square is constrained in y direction. Just sort the points in y direction and repeat, then choose the smallest square between the two outcomes.

Since you're running linearly through the points in x and y direction, that part is just O(N) and the dominant factor is the O(N log N) sort.

173

answered Sep 30 '22 14:09

MSalters

Related questions
                            
                                Difference between const and constexpr arrays
                            
                                Should a name found in a dependent base class hide a namespace name at the point of instantiation?
                            
                                Can I load a library from a memory stream?
                            
                                What are the rules for dynamic dispatching in c++?
                            
                                Link different C++ standard libraries on Mac OS X
                            
                                How to get wifi signal strength using Qt?
                            
                                ADsOpenObject() returns -2147024882 (0x8007000E) -> OUT_OF_MEMORY
                            
                                constexpr: errors with floating point representation?
                            
                                Performance difference between branch prediction and branch target prediction?
                            
                                C++ Strange behavior with double comparaison
                            
                                Multithreaded console I/O
                            
                                How to use a Python list to assign a std::vector in C++ using SWIG?
                            
                                What is an elephant number afterall?
                            
                                Undefined reference with clang++ with O2
                            
                                T = char can't be deduced for std::basic_string<T> foo = "foo"?
                            
                                Make custom intermediate (improve C compilable tester) in CMake
                            
                                Creating an sqlite3.lib file / using sqlite3 in Visual Studios [closed]
                            
                                C++ inline assembly (Intel compiler): LEA and MOV behaving differently in Windows and Linux
                            
                                g++ compiling C++11 using -Wpedantic option: Is there an option to disable only the warning about unnamed structs?
                            
                                Declaring constexpr functions or methods

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With