Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Idea for a data structure to store 2D data?

I have a large 2D grid, x-by-y. The user of the application will add data about specific points on this grid. Unfortunately, the grid is far too big to be implemented as a large x-by-y array because the system on which this is running does not have enough memory.

What is a good way to implement this so that only the points that have data added to them are stored in memory?

My first idea was to create a BST of the data points. A hash function such as "(long)x<<32 + y" would be used to compare the nodes.

I then concluded that this could lose efficiency if not well balanced so I came up with the idea of having a BST of comparable BSTs of points. The outer BST would compare the inner BSTs based on their x values. The inner BSTs would compare the points by their y values (and they would all have the same x). So when the programmer wants to see if there is a point at (5,6), they would query the outer BST for 5. If an inner BST exists at that point then the programmer would query the inner BST for 6. The result would be returned.

Can you think of any better way of implementing this?

Edit: In regards to HashMaps: Most HashMaps require having an array for the lookup. One would say "data[hash(Point)] = Point();" to set a point and then find the Point by hashing it to find the index. The problem, however, is that the array would have to be the size of the range of the hash function. If this range is less than the total number of data points that are added then they would either have no room or have to be added to an overflow. Because I don't know the number of points that will be added, I would have to make an assumption that this number would be less than a certain amount and then set the array to that size. Again, this instantiates a very large array (although smaller than originally if the assumption is that there will be less data points than x*y). I would like the structure to scale linearly with the amount of data and not take up a large amount when empty.

It looks like what I want is a SparseArray, as some have mentioned. Are they implemented similarly to having a BST inside of a BST?

Edit2: Map<> is an interface. If I were to use a Map then it looks like TreeMap<> would be the best bet. So I would end up with TreeMap< TreeMap< Point> >, similar to the Map< Map< Point> > suggestions that people have made, which is basically a BST inside of a BST. Thanks for the info, though, because I didn't know that the TreeMap<> was basically the Java SDK of a BST.

Edit3: For those whom it may concern, the selected answer is the best method. Firstly, one must create a Point class that contains (x,y) and implements comparable. The Point could potentially be compared by something like (((long)x)<<32)+y). Then one would TreeMap each point to the data. Searching this is efficient because it is in a balanced tree so log(n) cost. The user can also query all of this data, or iterate through it, by using the TreeMap.entrySet() function, which returns a set of Points along with the data.

In conclusion, this allows for the space-efficient and search-efficient implementation of a sparse array, or in my case, a 2D array, that can also be iterated through efficiently.

like image 332
Reed B Avatar asked Jun 21 '13 16:06

Reed B


2 Answers

Either a Quadtree, a k-d-tree or an R-tree.

Store index to large point array into one of the spatial structures. Such spatial structures are advantageous if the data is not equally distributed, like geographic data that concentrates in cities, and have no point in the sea.

Think if you can forget the regular grid, and stay with the quad tree.
(Think, why do you need a regular grid? A regular grid is usually only a simplification)

Under no circumstances use Objects to store a Point. Such an Object needs 20 bytes only for the fact that it is an object! A bad idea for a huge data set.

An int x[], and int[] y, or an int[]xy array is ideal related to memory usage.

Consider reading

Hanan Samet's "Foundations of Multidimensional Data Structures"

(at least the Introduction).

like image 189
AlexWien Avatar answered Oct 08 '22 12:10

AlexWien


You could use a Map<Pair, Whatever> to store your data (you have to write the Pair class). If you need to iterate the data in some specific order, make Pair Comparable, and use NavigableMap

like image 21
jmruc Avatar answered Oct 08 '22 13:10

jmruc