Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reduce the size of a large data set by sampling/interpolation to improve chart performance

I have a large set (>2000) of time series data that I'd like to display using d3 in the browser. D3 is working great for displaying a subset of the data (~100 points) to the user, but I also want a "context" view (like this) to show the entire data set and allow users to select as subregion to view in detail.

However, performance is abysmal when trying to display that many points in d3. I feel like a good solution would be to select a sample of the data and then use some kind of interpolation (spline, polynomial, etc., this is the part I know how to do) to draw a curve that is reasonably similar to the actual data.

However, it's not clear to me how I ought to go about selecting the subset. The data (shown below) has rather flat regions where fewer samples would be needed for a decent interpolation, and other regions where the absolute derivative is quite high, where more frequent sampling is needed.

To further complicate matters, the data has gaps (where the sensor generating it was failing or out of range), and I'd like to keep these gaps in the chart rather than interpolating through them. Detection of the gaps is fairly simple though, and simply clipping them out after drawing the entire data set with the interpolation seems like a reasonable solution.

I'm doing this in JavaScript, but a solution in any language or a mathematical answer to the problem would do.

the data in question

like image 249
jjm Avatar asked Jan 15 '15 19:01

jjm


2 Answers

You could use the d3fc-sample module, which provides a number of different algorithms for sampling data. Here's what the API looks like:

// Create the sampler
var sampler = fc_sample.largestTriangleThreeBucket();

// Configure the x / y value accessors
sampler.x(function (d) { return d.x; })
    .y(function (d) { return d.y; });

// Configure the size of the buckets used to downsample the data.
sampler.bucketSize(10);

// Run the sampler
var sampledData = sampler(data);

You can see an example of it running on the website:

https://d3fc.io/examples/sample/

The largest-triangle three-buckets algorithm works quite well on data that is 'patchy'. It doesn't vary the bucket size, but does ensure that peaks / troughs are included, which results in a good representation of the sampled data.

like image 54
ColinE Avatar answered Oct 28 '22 08:10

ColinE


I know this doesn't answer your question entirely, but this library might help you to simplify your line during rendering. Not sure if they handle data gaps though.

http://mourner.github.io/simplify-js/

like image 29
Glenn Avatar answered Oct 28 '22 08:10

Glenn