I have a data set consisting of hundreds of millions of data points. I'd like to be able to effectively render such a set depending on the zoom level (i.e. axis scale). I'd like to be able to have a sampled subset render at the full view. As you zoom in, you'll be able to see more detailed data points until you reach maximum zoom, at that point you'll be able to see individual data points. What would be a good data structure to store such a data set and allows multi resolution access?
You need to keep your points spatially indexed, because "outlier" and "density" are spatial properties -- an outlier is a point that happens to be in a low-density area; and "zooming out" would mean replacing sets of close-together points for 'sampled' points; and when "zooming in" you really, really want to ignore all those points that fall outside the current window. Your operations could be something like:
void addPoint(Point2D p);
void removePoint(Point2D p);
Iterator<Point2D> getPointsToPaint(Rectangle2D viewArea, int maxDensity, double densityArea);
where the viewArea
represents the window you want to find points for, and the maxDensity
parameter could be used to control point abstraction: when more than maxDensity points fall within a densityArea
square, you return maxDensity
random points within that area instead. getPointsToPaint
would then cover your viewArea
with densityArea
sampling boxes and return the points within: the real points if less than maxDensity
, and the "sampled" ones if over maxDensity
(nobody will notice if 10 points within a 1mm2 area are random or not).
Typical spatial structures are quad-trees (for 2d) and kd-trees (for any number of ds). However, in their default implementations, neither of them is too good for quickly-changing dynamic data. Another option is to use spatial hashing; but you really seem to need a multi-level approach, and for multi-level, trees are always the way to go. From a quick review of search results for "dynamic spatial indexing", it seems that a variant of the r-tree may be what you are looking for. Beware that these data-structures are not easy to implement from scratch. The best approach may be to rely on an external GIS system to do the bookkeeping for you. Several Java GISs are available.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With