Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Class Architecture of Monitoring Log Data

I have a working real time monitoring program but it's class architecture is too complex. And this disturbs me really much. Let me start by explaining the program.

User Interaction

This is a monitoring program with user interaction. Which means, user can select different dimensions, different metrics, include them, exlude them or group them and everytime real-time chart changes according to user's decisions.

Example Log Data from DB

Req Success OrderFunction 5 60ms WebServer2
Req Failed  OrderFunction 2 176ms WebServer5
Resp Success SuggestFunction 8 45ms WebServer2

The Conversion

So every row is important with it's every column. And it has to be on the client-side like this. Because user can choose to see Successful OrderFunctions or All the functions on WebServer2 or All Failed Request etc. I need all the relations between these columns to do these.

Another thing is these are the values that comes from Database. I also have lookups for these values which holds the Text's that users need to see. Like Req is Request, Resp is Response.

I know you can see this question as a general one. But I'm trying to find a way. May be this kind of class architecture has a name in the industry. I'm just here for some advices to lead me in to a right way.

Thanks a lot

like image 673
Xelom Avatar asked Nov 13 '22 03:11

Xelom


1 Answers

15k records every 3 minutes, sounds a lot like what I used to see with network monitoring applications in data centers (snmp can get very noisy in that kind of environment). What we'd do is determine how much of the data we need, for how long, at what level of granularity, and that information goes into determining what kind of roll-up strategy to use - also, how much storage space we were willing to throw at the problem. With a roll-up strategy where you combine over time rows, by merging their columns, you can make sure there is a finite limit to the size of the database.

There are probably newer tools out there these days but I used to use RRD (http://oss.oetiker.ch/rrdtool/) and BerkeleyDB for example for these kinds of monitoring problems. You can also perhaps take advantage of some software de-duplication, an approach wherein you merely update a count if a row is found to be similar to a previous row, by nature of the contents of its columns. We used to do this to prevent event storms from flooding NOC screens and causing the technicians to miss critical events. By the way, I would have left this as a comment, but stackoverflow does this reputation thing that prevents me and I just started answering questions on here yesterday.

So to be more complete, using your data as an example:

Req Success OrderFunction 5 60ms WebServer2
Req Failed  OrderFunction 2 176ms WebServer5
Resp Success SuggestFunction 8 45ms WebServer2

I assume Req/Resp are the only two values - corresponding to request and response? If this is the case, make that column binary, 1 bit - whether it was a request or not. The second column, Success/Failed - sounds like a 1 bit or at worst ternary, 2 bit field. The functions (OrderFunction, SuggestFunction, etc) can probably be enumerated - or if you are doing deduplication here you might make it a bitmask. You could also just use a foreign key for this into a join table. In the enumerated option, lets say you have less than 256 of these but more than 128, use a byte. If you might roll them up in a an event de-duplication solution in order to save rows, especially when they are coming in fast like that, and you have 256 options, then you need exactly that many bits for your bitmask, unless it is the case that not every permutation is required to be represented, in which case, figure out the maximum number of permutations, and that is the number of bits in your bitmask for the de-duplication to roll up correctly. The next column with 5,2, and 8 in it, I'm not sure what that represents, an integer of some kind or maybe just a byte? The milliseconds can be represented, depending on your SQL dialect and the max milliseconds you expect to need to represent, with an int or maybe an unsigned short, or maybe just a short (which would be basically about 32.7 seconds). If you use the short or unsigned short, just make sure a value beyond the max is represent as the max, and not zero, in your application logic. The last column looks like a string representing your servers so that is probably a column that you would use to help guide de-duplication or roll-up. So you could make that a foreign key perhaps.

Anyway, RRD used to be really good but I haven't used it in nearly a dozen years - I take that back, I haven't used RRD in over a dozen years :). I'm sure BerkeleyDB is still a good database though for this kind of thing - so check out those tools and tools like them and I'm sure a good solution will come out of it.

Hope that helps!

like image 106
Matt Mullens Avatar answered Nov 15 '22 10:11

Matt Mullens