Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Graphite does not graph values correctly when using long durations?

Tags:

graphite

I'm trying to graph data using statsd and graphite. I have a simple counter, I increment it by 1, and then when I graph the values for the counter over the day, I see strange values like 0.09 as the peak in my graph (see http://i.stack.imgur.com/o4gmz.png)

This graph should be showing 2 logins, but instead it's showing 0.09. If I change the time scale from 1 day to the last 15 minutes, then it correctly shows the two logins (see http://i.stack.imgur.com/23vDJ.png)

I've set up my finest retention to be in 10s increments in storage-schemas.conf:

retentions = 10s:7d,1m:21d,24h:5y

I've set up my storage-aggregation.conf file to sum counts:

[sum]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum

(And, before you ask, yes; this is a .count).

If I try my URL with &rawData=true then in either case I see some Nones, some 0.0s, and a pair of 1.0s separated by some 0.0s. I never see these fractional values that somehow show up on the graph. So... Is this a bug? Am I doing something wrong?

like image 591
Jason Walton Avatar asked Oct 22 '12 20:10

Jason Walton


Video Answer


1 Answers

There's also consolidateBy function which tells graphite what to do if there's no enough pixels to draw everything accurately. By default it's using "avg" function and therefore strange results when time ranges are greater. Here excerpt from documentation:

When a graph is drawn where width of the graph size in pixels is smaller than the number of datapoints to be graphed, Graphite consolidates the values to to prevent line overlap. The consolidateBy() function changes the consolidation function from the default of ‘average’ to one of ‘sum’, ‘max’, or ‘min’. This is especially useful in sales graphs, where fractional values make no sense and a ‘sum’ of consolidated values is appropriate.

Another function that could be useful is hitcount. Short excerpt from here why it's useful:

This function is like summarize(), except that it compensates automatically for different time scales (so that a similar graph results from using either fine-grained or coarse-grained records) and handles rarely-occurring events gracefully.

I spent some time scratching my head why I get fractions for my counter with time ranges longer than couple hours when my aggregation rule is max. It's pretty confusing, especially at the beginning when you play with single counters to see if everything works. Checking rawData is quite a good way for debugging sanity check ;)

like image 170
slawek Avatar answered Sep 28 '22 01:09

slawek