Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

D3.js scattergraph with large (>500,000) points? Clustering?

I'm looking at plotting a scatterplot with a large number of points (500,000 and upwards).

Currently, we're doing this in Python with Matplotlib. It plots the points, and it provides controls to pan and zoom. I don't believe it provides any clustering or points, it just plots them all - doesn't make much sense at the zoomed out view, I suppose, but you can zoom in and they're all there.

I was looking at doing the chart in JavaScript, to make it a bit easier to distribute. I was looking at D3.js, to see if something similar is feasible there. I did find this example of a basic scatterplot:

http://bl.ocks.org/mbostock/3887118

Firstly, would you be able to plot that number of points? (500,000 and upwards) I was under the impression you couldn't due to the overhead of all the DOM objects? Are there ways around this?

Secondly, is there any kind of clustering available, either a library or even just an example of this being done in D3.js?

Thirdly, if anybody knows any good examples of pan/zoom functionality and clustering, or even just a packaged JS library that handles it, that would be awesome.

Fourth, it would be also nice to have click handlers for each point - and to display some text either in a overlay, or even just in a separate window. Any thoughts on this?

like image 251
victorhooi Avatar asked Dec 19 '14 20:12

victorhooi


2 Answers

Can you draw half a million points with D3? Sure, but not with SVG. You'll have to use canvas (here's a simple example with 10,000 points that includes brush-based selection: http://bl.ocks.org/emeeks/306e64e0d687a4374bcd) and that means that you no longer have individual elements to assign click handlers to. You will not be able to render half a million points with SVG, because all those DOM elements will choke your interface, as you mentioned.

D3 does include quadtree support that can be leveraged for clustering. It's in use in the above example to speed up search but you could use it to nest elements in proximity at certain scales.

Ultimately, your choices are:

1) Some other library/custom implementation that renders in canvas and polls the mouse position to give you the data element rendered at that point.

2) A sophisticated custom D3 approach that nests elements in proximity and only renders SVG elements appropriate at the zoom level and canvas position (pan) you're at.

like image 126
Elijah Avatar answered Sep 18 '22 22:09

Elijah


Yes, D3.js can be made to work with million scale data with two things:

  1. pre-rendering on the server side. For more see here: https://mango-is.com/blog/engineering/pre-render-d3-js-charts-at-server-side/

    • By aggregating (or clustering) part of the data so that user can interact and expand the graph if need be. For this use collapsible nodes if you can (http://bl.ocks.org/mbostock/1062288).

    • Also avoid using force layout. It takes time to settle and converge to a stable positioning.

For clustering libraries, I would pick one up off the shelf. I would choose the scikits library from python, there are many in JavaScript but they are not very robust as they mostly cover k-means or hierarchical clustering. I would precalculate the coordinates using scikits by clustering and then render it using D3.

D3 handles Pan and zoom. Again click handlers and text display are available in D3. (http://bl.ocks.org/robschmuecker/7880033)

like image 39
Abhimanu Kumar Avatar answered Sep 17 '22 22:09

Abhimanu Kumar