scaling statsd with multiple servers

Tags:

I am laying out an architecture where we will be using statsd and graphite. I understand how graphite works and how a single statsd server could communicate with it. I am wondering how the architecture and set up would work for scaling out statsd servers. Would you have multiple node statsd servers and then one central statsd server pushing to graphite? I couldn't seem to find anything about scaling out statsd and any ideas of how to have multiple statsd servers would be appreciated.

682

asked Oct 13 '12 09:10

Shawn

1 Answers

I'm dealing with the same problem right now. Doing naive load-balancing between multiple statsds obviously doesn't work because keys with the same name would end up in different statsds and would thus be aggregated incorrectly.

But there are a couple of options for using statsd in an environment that needs to scale:

use client-side sampling for counter metrics, as described in the statsd documentation (i.e. instead of sending every event to statsd, send only every 10th event and make statsd multiply it by 10). The downside is that you need to manually set an appropriate sampling rate for each of your metrics. If you sample too few values, your results will be inaccurate. If you sample too much, you'll kill your (single) statsd instance.
build a custom load-balancer that shards by metric name to different statsds, thus circumventing the problem of broken aggregation. Each of those could write directly to Graphite.
build a statsd client that counts events locally and only sends them in aggregate to statsd. This greatly reduces the traffic going to statsd and also makes it constant (as long as you don't add more servers). As long as the period with which you send the data to statsd is much smaller than statsd's own flush period, you should also get similarly accurate results.
variation of the last point that I have implemented with great success in production: use a first layer of multiple (in my case local) statsds, which in turn all aggregate into one central statsd, which then talks to Graphite. The first layer of statsds would need to have a much smaller flush time than the second. To do this, you will need a statsd-to-statsd backend. Since I faced exactly this problem, I wrote one that tries to be as network-efficient as possible: https://github.com/juliusv/ne-statsd-backend

As it is, statsd was unfortunately not designed to scale in a manageable way (no, I don't see adjusting sampling rates manually as "manageable"). But the workarounds above should help if you are stuck with it.

answered Sep 28 '22 05:09

Julius

Related questions
                            
                                Source file is correct, actual print is moved to the right and bottom using system.drawing printing
                            
                                D3 zoom v3 vs v5
                            
                                How do I use trans_new to shift a scale by a constant in ggplot2 without messing up the tick labels?
                            
                                select numeric columns and one column specified by name from data frame
                            
                                WPF Application same size at every system scale (scale independent)
                            
                                Scaling MATLAB/Octave plot so that all the labels can be printed properly
                            
                                A little Game/Quiz: Do you see my values? (Interpreting Hex-Values)
                            
                                d3.js: limit size of brush
                            
                                Windows Forms Graphic Issue on Windows 10 OS
                            
                                Why does the scale CSS transformation function not affect margin?
                            
                                Transforming variable density on log scale with R
                            
                                Scaling a fill pattern in raphael.js
                            
                                D3.js time scale tick marks - Years and months only - Custom time format
                            
                                UWP doesn't use best scaled Splash Screen
                            
                                How to show only part of a bitmap--
                            
                                CSS: Problems when using object-fit and transform together on webkit
                            
                                How to standardize ONE column in Spark using StandardScaler?
                            
                                How can NSSlider be customised to provide a non-linear scale in cocoa?
                            
                                Scaling data in R gives spurious Error "length of 'center' must equal the number of columns of 'x'"
                            
                                Android scale button on touch

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

scaling statsd with multiple servers

Tags:

scale

scaling

statsd

Shawn

People also ask

1 Answers

Julius

Recent Activity

Donate For Us