I've a large dataset comprises 10^5 data points. And now I'm considering the following question related to large dataset:
Is there any efficient way to visualize very large dataset? In my case I have a user set and each user has 10^3 items. There are 10^5 items in total. I want to show all the items for each user at a time to enable quick comparison between users. Some body suggests using a list, but I don't think a list is the only choice when dealing with this big dataset.
Note
I want to show all the items for each user at a time.
This means I want to show all the datapoints when click on a user, and when I click on two uses, I can compare the difference between there datapoints.
SVG charts can typically handle around 1,000 datapoints. Since D3 v4 you've also had the option to render charts using canvas, which is an immediate mode graphics model. With Canvas you can expect to render around 10,000 datapoints whilst maintaining smooth 60fps interactions.
D3 Modules For example d3-quadtree or d3-time-format aren't SVG or Canvas specific as they don't deal with the DOM or rendering at all. Modules such as d3-hierarchy don't actually render anything either, but provide the information needed to render in either Canvas or SVG.
You could use D3 to do the data transformations, and use WebGL to do the rendering, for sure.
The problem is not to render them. You could switch to canvas or webgl for the rendering part. You can find some examples of using canvas and X3DOM with D3 data-binding. But it will be slow because of the number of DOM objects, so it's better to keep them separated, as in this parallel coordinates example. This example also features progressive rendering to load and render all the data elements.
Keeping them in memory and manipulating them client-side is not a problem neither. D3 is often used with Crossfilter for quick data manipulation of "million or more records".
10^5 data points are just slightly too many points for SVG interactive rendering. But too many data points in a visualization is often a hint that you have the wrong level of abstraction or the wrong plotting strategy. A lot of points will probably overlap or visually fuse. So why not aggregate these shapes, for example using heatmap (color scale for number of overlapping points), binning (hexbin, histogram), or summarizing the dataset?
If what you want is an overview, and comparing datasets, you probably need an abstraction, like some statistics summarizing your dataset, then see a detail on-demand (semantic zoom, focus+context, drill-down).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With