Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best practice for centralised logging? [closed]

My team has inherited support for 100+ applications. The applications don't have any kind of common architecture, so the ones that do logging usually do it with custom code to local files or a local database, and it's all unmanaged. We want to change that.

We're slowly migrating the applications over to using log4net and standardising the types of things that are logged. The next question becomes: where should we send the logs?

I was thinking that it would be good to use a central SQL Server dedicated to receiving all the logs, which would provide easy maintenance (one place for backups/archiving) and provide the future possibility of some data mining and trend analysis.

Is that the best practice for this kind of thing, or is there some dedicated application logging server we should be looking at instead?

Update: I should have been more clear than just casually mentioning log4net and SQL Server: we're a Microsoft house, with most things written in .NET. UNIX solutions are no good for us.

like image 470
Stewart Johnson Avatar asked Nov 15 '09 14:11

Stewart Johnson


People also ask

What is the benefit of centralized logging?

Centralized logging provides two important benefits. First, it places all of your log records in a single location, greatly simplifying log analysis and correlation tasks. Second, it provides you with a secure storage area for your log data.

What services can be used to create a centralized logging solution?

The basic administrations that you can utilize are Amazon CloudWatch Logs, store them in Amazon S3, and afterward use Amazon Elastic Search to picture them.


3 Answers

One world of caution: at 100+ apps in a big shop, with hundreds perhaps thousands of hosts running those apps, steer clear of anything that induces a tight coupling. This pretty much rules out connect directly to SQL Server or any database solution, because your application logging will be dependent on the availability of the log repository.

Availability of the central repository is a little more complicated than just 'if you can't connect, don't log it' because usually the most interesting events occur when there are problems, not when things go smooth. If your logging drops entries exactly when things turn interesting, it will never be trusted to solve incidents and as such will fail to gain traction and support for other stake holders (ie. the application owners).
If you decide that you can implement retention and retry failed log info delivery on your own, you are facing an uphill battle: it is not a trivial task and is much more complex than it sounds, starting from eficient and reliable storage of the retained information and ending with putting in place good retry and inteligent fallback logic.

You also must have an answer to the problems of authentication and security. Large orgs have multiple domains with various trust relations, employees venture in via VPN or Direct Access from home, some applications run unattended, some services are configured to run as local users, some machines are not joined to the domain etc etc. You better have an asnwer to the question how is the logging module of each application, everywhere is deployed, going to authenticate with the central repository (and what situations are going to be unsuported).

Ideally you would use an out-of-the box delivery mechanism for your logging module. MSMQ is probably the most appropiate fit: robust asynchronous reliable delivery (at least to the extent of most use cases), available on every Windows host when is installed (optional). Which is the major pain point, your applications will take a dependency on a non-default OS component.

The central repository storage has to be able to deliver the information requested, perhaps:

  • the application developers investigating incidents
  • customer support team investigating a lost transaction reported by a customer complaint
  • the security org doing forensics
  • the business managers demanding statistics, trends and aggregated info (BI).

The only storage capable of delivering this for any serious org (size, lifetime) is a relational engine, so probably SQL Server. Doing analysis over text files is really not going to go the distance.

So I would recommend a messaging based log transport/delivery (MSMQ) and a relational central repository (SQL Server) perhaps with aanalitycal component on top of it (Analysis Services Data Mining). as you see, this is clearly no small feat and it covers slightly more than just configuring log4net.

As for what to log, you say you already give a thought but I'd like to chime in my extra 2c: often times, specially on incident investigation, you will like the ability to request extra information. This means you would like to know certain files content from the incident machine, or some registry keys, or some performance counter values, or a full process dump. It is very useful to be able to request this information from the central repository interface, but is impractical to always collect this information, just in case is needed. Which implies there has to be some sort of bidirectional communication between the applictaion and the central repository, when the application reports an incident it can be asked to add extra information (eg a dump of the process at fault). There has to be a lot of infrastructure in place for something like this to occur, from the protocol between application logging and the central repository, to the ability of the central repository to recognize an incident repeat, to the capacity of the loggin library to collect the extra information required and not least the ability of an operator to mark incidents as needing extra information on next occurence.

I understand this answer goes probably seems overkill at the moment, but I was involved with this problem space for quite a while, I had looked at many online crash reports from Dr. Watson back in the day when I was with MS, and I can tell you that these requirement exists, they are valid concerns and when implemented the solution helps tremendously. Ultimately, you can't fix what you cannot measure. A large organisation depends on good management and monitoring of its application stock, including logging and auditing.

There are some third party vendors that offer solutions, some even integrated with log4net, like bugcollect.com (Full disclosure: that's my own company), Error Traffic Controller or Exceptioneer and other.

like image 155
Remus Rusanu Avatar answered Oct 23 '22 12:10

Remus Rusanu


Logstash + Elasticsearch + Kibana + Redis or RabbitMQ + NLog or Log4net

Storage + Search & Analytics: Elasticsearch
Collecting & Parsing : Logstash
Visualize: Kibana
Queue&Buffer: Redis
In Application: NLog

like image 9
mehmet mecek Avatar answered Oct 23 '22 12:10

mehmet mecek


The 1024 byte Syslog message length limit mentioned so far is misleading and incorrectly biases against Syslog-based solutions to the problem.

The limit for the obsolete "BSD Syslog Protocol" is indeed 1024 bytes.

The BSD syslog Protocol - 4.1 syslog Message Parts

The limit for the modern "Syslog Protocol" is implementation-dependent but MUST be at least 480 bytes, SHOULD be at least 2048 bytes, and MAY be even higher.

The BSD syslog Protocol - 6.1. Message Length

As an example, Rsyslog's configuration setting is called MaxMessageSize, which the documentation suggests can be set at least as high as 64kb.

rsyslog - Configuration Directives

That the asker's organisation is "a Microsoft house" where "UNIX solutions are no good" should not prevent less discriminatory readers from getting accurate information.

like image 5
Ron MacNeil Avatar answered Oct 23 '22 12:10

Ron MacNeil