Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a good data structure for a multi resolution graph?

I have a data set consisting of hundreds of millions of data points. I'd like to be able to effectively render such a set depending on the zoom level (i.e. axis scale). I'd like to be able to have a sampled subset render at the full view. As you zoom in, you'll be able to see more detailed data points until you reach maximum zoom, at that point you'll be able to see individual data points. What would be a good data structure to store such a data set and allows multi resolution access?

like image 907
z - Avatar asked Nov 01 '22 10:11

z -


1 Answers

You need to keep your points spatially indexed, because "outlier" and "density" are spatial properties -- an outlier is a point that happens to be in a low-density area; and "zooming out" would mean replacing sets of close-together points for 'sampled' points; and when "zooming in" you really, really want to ignore all those points that fall outside the current window. Your operations could be something like:

void addPoint(Point2D p);
void removePoint(Point2D p);
Iterator<Point2D> getPointsToPaint(Rectangle2D viewArea, int maxDensity, double densityArea);

where the viewArea represents the window you want to find points for, and the maxDensity parameter could be used to control point abstraction: when more than maxDensity points fall within a densityArea square, you return maxDensity random points within that area instead. getPointsToPaint would then cover your viewArea with densityArea sampling boxes and return the points within: the real points if less than maxDensity, and the "sampled" ones if over maxDensity (nobody will notice if 10 points within a 1mm2 area are random or not).

Typical spatial structures are quad-trees (for 2d) and kd-trees (for any number of ds). However, in their default implementations, neither of them is too good for quickly-changing dynamic data. Another option is to use spatial hashing; but you really seem to need a multi-level approach, and for multi-level, trees are always the way to go. From a quick review of search results for "dynamic spatial indexing", it seems that a variant of the r-tree may be what you are looking for. Beware that these data-structures are not easy to implement from scratch. The best approach may be to rely on an external GIS system to do the bookkeeping for you. Several Java GISs are available.

like image 117
tucuxi Avatar answered Nov 10 '22 16:11

tucuxi