Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Graph plotting: only keeping most relevant data

In order to save bandwith and so as to not to have generate pictures/graphs ourselves I plan on using Google's charting API:

http://code.google.com/apis/chart/

which works by simply issuing a (potentially long) GET (or a POST) and then Google generate and serve the graph themselves.

As of now I've got graphs made of about two thousands entries and I'd like to trim this down to some arbitrary number of entries (e.g. by keeping only 50% of the original entries, or 10% of the original entries).

How can I decide which entries I should keep so as to have my new graph the closest to the original graph?

Is this some kind of curve-fitting problem?

Note that I know that I can do POST to Google's chart API with up to 16K of data and this may be enough for my needs, but I'm still curious

like image 505
SyntaxT3rr0r Avatar asked Jan 12 '11 22:01

SyntaxT3rr0r


People also ask

What kind of graph can we use to keep track of the data?

. . . a Line graph. Line graphs are used to track changes over short and long periods of time. When smaller changes exist, line graphs are better to use than bar graphs. Line graphs can also be used to compare changes over the same period of time for more than one group.

What is the best graph to use when a lot of data is repeated?

Use a line chart or an area chart to show changes that are continuous over time. Line charts are the most effective chart for displaying time-series data. They can handle a ton of data points and multiple data series, and everyone knows how to read them.


1 Answers

The flot-downsample plugin for the Flot JavaScript graphing library could do what you are looking for, up to a point.

The purpose is to try retain the visual characteristics of the original line using considerably fewer data points.

The research behind this algorithm is documented in the author's thesis.

Note that it doesn't work for any kind of series, and won't give meaningful results when you want a downsampling factor beyond 10, in my experience.

The problem is that it cuts the series in windows of equal sizes then keep one point per window. Since you may have denser data in some windows than others the result is not necessarily optimal. But it's efficient (runs in linear time).

like image 143
MasterScrat Avatar answered Sep 17 '22 12:09

MasterScrat