Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Graphite Graph - how fast can we update the graph?

We are trying to use Graphite for a (near) real-time graphing web system. However we cannot seem to speed graphite faster than 1 second update rates. Ultimately we would like to have 100ms updates

From reading the FAQ it makes it sound like graphite is fast - but this is either very misleading or I am not understanding how to speed up graphite

the timing information for whisper appears to use UNIX time stamps

How scalable is Graphite?

From a CPU perspective, Graphite scales horizontally on both the frontend and the backend, meaning you can simply add more machines to the mix to get more throughput. It is also fault tolerant in the sense that losing a backend machine will cause a minimal amount of data loss (whatever that machine had cached in memory) and will not disrupt the system if you have sufficient capacity remaining to handle the load.

From an I/O perspective, under load Graphite performs lots of tiny I/O operations on lots of different files very rapidly. This is because each distinct metric sent to Graphite is stored in its own database file, similar to how many tools (drraw, Cacti, Centreon, etc) built on top of RRD work. In fact, Graphite originally did use RRD for storage until fundamental limitations arose that required a new storage engine.

High volume (a few thousand distinct metrics updating minutely) pretty much requires a good RAID array. Graphite's backend caches incoming data if the disks cannot keep up with the large number of small write operations that occur (each data point is only a few bytes, but most disks cannot do more than a few thousand I/O operations per second, even if they are tiny). When this occurs, Graphite's database engine, whisper, allows carbon to write multiple data points at once, thus increasing overall throughput only at the cost of keeping excess data cached in memory until it can be written.

How real-time are the graphs?

Very. Even under heavy load, where the number of metrics coming in each time interval is much greater than the rate at which your storage system can perform I/O operations and lots of data points are being cached in the storage pipeline (see previous question for explanation), Graphite still draws real-time graphs. The trick is that when the Graphite webapp receives a request to draw a graph, it simultaneously retrieves data off the disk as well as from the pre-storage cache (which may be distributed if you have multiple backend servers) and combines the two sources of data to create a real-time graph.

Also They only show seconds and no fractional points here: http://graphite.readthedocs.org/en/latest/config-carbon.html and from and until must be a time specification conforming to the AT-STYLE time specification described here: http://oss.oetiker.ch/rrdtool/doc/rrdfetch.en.html. http://graphite.wikidot.com/url-api-reference

So what is it? Is graphite fast? or is it just fast to process large datasets - we are looking for a simple to use web receiver of packet data to display visually - Graphite seemed like a great solution but now that we have it all configured and running I am guessing we just wasted a lot of time

Thanks!

like image 350
GregM Avatar asked Oct 01 '13 12:10

GregM


Video Answer


1 Answers

Graphite will store at most one data point per the finest defined precision (additional data points received will be dropped) in your storage-schemas.conf. The finest precision possible is 1 second. e.g. retentions = 1s:6h,1min:7d,10min:5y

In order to meet your goals you'll need to put an aggregator in front of Graphite. The aggregator will take in all the metrics and aggregate the data, flushing to Graphite storage in order to match the storage schema. The aggregator will perform calculations (avg, sum, mean, etc) on the metrics and send those on. e.g. over the last second you averaged 14ms to process the request OR for the last 10 seconds the total number of requests was 4234.

So, while you can't report at a finer granularity than 1 second, you can use the aggregator to capture -the sum and average of- what has happened during the 1 second time interval and report that.

Two common choices are StatsD and the Graphite provided in carbon aggregator.

** StatsD, IMO is the way to go. It is a network daemon that you run separately and send to it over UDP. That said, you can do much the same (e.g UDP) with carbon-aggregator.py.

like image 68
Matt Self Avatar answered Sep 21 '22 15:09

Matt Self