What's the best practice for centralised logging? [closed]

Tags:

My team has inherited support for 100+ applications. The applications don't have any kind of common architecture, so the ones that do logging usually do it with custom code to local files or a local database, and it's all unmanaged. We want to change that.

We're slowly migrating the applications over to using log4net and standardising the types of things that are logged. The next question becomes: where should we send the logs?

I was thinking that it would be good to use a central SQL Server dedicated to receiving all the logs, which would provide easy maintenance (one place for backups/archiving) and provide the future possibility of some data mining and trend analysis.

Is that the best practice for this kind of thing, or is there some dedicated application logging server we should be looking at instead?

Update: I should have been more clear than just casually mentioning log4net and SQL Server: we're a Microsoft house, with most things written in .NET. UNIX solutions are no good for us.

470

asked Nov 15 '09 14:11

Stewart Johnson

3 Answers

One world of caution: at 100+ apps in a big shop, with hundreds perhaps thousands of hosts running those apps, steer clear of anything that induces a tight coupling. This pretty much rules out connect directly to SQL Server or any database solution, because your application logging will be dependent on the availability of the log repository.

Availability of the central repository is a little more complicated than just 'if you can't connect, don't log it' because usually the most interesting events occur when there are problems, not when things go smooth. If your logging drops entries exactly when things turn interesting, it will never be trusted to solve incidents and as such will fail to gain traction and support for other stake holders (ie. the application owners).
If you decide that you can implement retention and retry failed log info delivery on your own, you are facing an uphill battle: it is not a trivial task and is much more complex than it sounds, starting from eficient and reliable storage of the retained information and ending with putting in place good retry and inteligent fallback logic.

You also must have an answer to the problems of authentication and security. Large orgs have multiple domains with various trust relations, employees venture in via VPN or Direct Access from home, some applications run unattended, some services are configured to run as local users, some machines are not joined to the domain etc etc. You better have an asnwer to the question how is the logging module of each application, everywhere is deployed, going to authenticate with the central repository (and what situations are going to be unsuported).

Ideally you would use an out-of-the box delivery mechanism for your logging module. MSMQ is probably the most appropiate fit: robust asynchronous reliable delivery (at least to the extent of most use cases), available on every Windows host when is installed (optional). Which is the major pain point, your applications will take a dependency on a non-default OS component.

The central repository storage has to be able to deliver the information requested, perhaps:

the application developers investigating incidents
customer support team investigating a lost transaction reported by a customer complaint
the security org doing forensics
the business managers demanding statistics, trends and aggregated info (BI).

The only storage capable of delivering this for any serious org (size, lifetime) is a relational engine, so probably SQL Server. Doing analysis over text files is really not going to go the distance.

So I would recommend a messaging based log transport/delivery (MSMQ) and a relational central repository (SQL Server) perhaps with aanalitycal component on top of it (Analysis Services Data Mining). as you see, this is clearly no small feat and it covers slightly more than just configuring log4net.

As for what to log, you say you already give a thought but I'd like to chime in my extra 2c: often times, specially on incident investigation, you will like the ability to request extra information. This means you would like to know certain files content from the incident machine, or some registry keys, or some performance counter values, or a full process dump. It is very useful to be able to request this information from the central repository interface, but is impractical to always collect this information, just in case is needed. Which implies there has to be some sort of bidirectional communication between the applictaion and the central repository, when the application reports an incident it can be asked to add extra information (eg a dump of the process at fault). There has to be a lot of infrastructure in place for something like this to occur, from the protocol between application logging and the central repository, to the ability of the central repository to recognize an incident repeat, to the capacity of the loggin library to collect the extra information required and not least the ability of an operator to mark incidents as needing extra information on next occurence.

I understand this answer goes probably seems overkill at the moment, but I was involved with this problem space for quite a while, I had looked at many online crash reports from Dr. Watson back in the day when I was with MS, and I can tell you that these requirement exists, they are valid concerns and when implemented the solution helps tremendously. Ultimately, you can't fix what you cannot measure. A large organisation depends on good management and monitoring of its application stock, including logging and auditing.

There are some third party vendors that offer solutions, some even integrated with log4net, like bugcollect.com (Full disclosure: that's my own company), Error Traffic Controller or Exceptioneer and other.

155

answered Oct 23 '22 12:10

Remus Rusanu

Logstash + Elasticsearch + Kibana + Redis or RabbitMQ + NLog or Log4net

Storage + Search & Analytics: Elasticsearch
Collecting & Parsing : Logstash
Visualize: Kibana
Queue&Buffer: Redis
In Application: NLog

answered Oct 23 '22 12:10

mehmet mecek

The 1024 byte Syslog message length limit mentioned so far is misleading and incorrectly biases against Syslog-based solutions to the problem.

The limit for the obsolete "BSD Syslog Protocol" is indeed 1024 bytes.

The BSD syslog Protocol - 4.1 syslog Message Parts

The limit for the modern "Syslog Protocol" is implementation-dependent but MUST be at least 480 bytes, SHOULD be at least 2048 bytes, and MAY be even higher.

The BSD syslog Protocol - 6.1. Message Length

As an example, Rsyslog's configuration setting is called MaxMessageSize, which the documentation suggests can be set at least as high as 64kb.

rsyslog - Configuration Directives

That the asker's organisation is "a Microsoft house" where "UNIX solutions are no good" should not prevent less discriminatory readers from getting accurate information.

answered Oct 23 '22 12:10

Ron MacNeil

Related questions
                            
                                Removing objects from NHibernate second level cache
                            
                                Cannot change global variables in a function through an exec() statement?
                            
                                How to convert a regular win32 (VC++ vcproj) project to a Qt project?
                            
                                How to reference one CTE twice?
                            
                                Windows batch file to delete .svn files and folders
                            
                                Is anyone else receiving a QUOTA_EXCEEDED_ERR on their iPad when accessing localStorage?
                            
                                Passing enum parameter to a case class does not work
                            
                                Boost.Thread Linking - boost_thread vs. boost_thread-mt
                            
                                Insert file contents into MySQL table's column
                            
                                ASP.NET MVC: First access after some minutes slow, then every following request is fast
                            
                                Are applications with many DLLs a bad thing?
                            
                                velocity template and javascript

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the best practice for centralised logging? [closed]

Tags:

logging

centralized

Stewart Johnson

People also ask

3 Answers

Remus Rusanu

mehmet mecek

Ron MacNeil

Recent Activity

Donate For Us