Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is a better approach in logging - files or DB?

Okay, here's the scenario. I have a utility that processes tons of records, and enters information to the Database accordingly.

It works on these records in multi-threaded batches. Each such batch writes to the same log file for creating a workflow trace for each record. Potentially, we could be making close to a million log writes in a day.

Should this log be made into a database residing on another server? Considerations:

  1. The obvious disadvantage of multiple threads writing to the same log file is that the log messages are shuffled amongst each other. In the database, they can be grouped by batch id.
  2. Performance - which would slow down the batch processing more? writing to a local file or sending log data to a database on another server on the same network. Theoretically, the log file is faster, but is there a gotcha here?

Are there any optimizations that can be done on either approach?

Thanks.

like image 986
Vaibhav Avatar asked Aug 27 '08 06:08

Vaibhav


People also ask

Should you log to file or database?

In hindsight, a better answer is to log to BOTH file system (first, immediately) and then to a centralized database (even if delayed). Most modern logging frameworks follow a publish-subscribe model (often called logging sources and sinks) which will allow multiple logging sinks (targets) to be defined.

Which is better file or database?

Advantage of the File System over Data base Management System is: When handling small data sets with arbitrary, probably unrelated data, file is more efficient than database. For simple operations, read, write, file operations are faster and simple. You can find n number of difference over internet.

Which DB is best for logging?

If you are only logging lots and lots of simple log messages, MongoDB is a very good choice as it scales so good.

Which is faster DB or file?

As a general rule, databases are slower than files.


7 Answers

The interesting question, should you decide to log to the database, is where do you log database connection errors?

If I'm logging to a database, I always have a secondary log location (file, event log, etc) in case there are communication errors. It really does make it easier to diagnose issues later on.

like image 171
ZombieSheep Avatar answered Dec 14 '22 22:12

ZombieSheep


One thing that comes to mind is that you could have each thread writing to its own log file and then do a daily batch run to combine them.

If you are logging to database you probably need to do some tuning and optimization, especially if the DB will be across the network. At the least you will need to be reusing the DB connections.

Furthermore, do you have any specific needs to have the log in database? If all you need is a "grep " then I don't think you gain much by logging into database.

like image 41
Rowan Avatar answered Dec 14 '22 22:12

Rowan


I second the other answers here, depends on what you are doing with the data.

We have two scenarios here:

  1. The majority of the logging is to a DB since admin users for the products we build need to be able to view them in their nice little app with all the bells and whistles.

  2. We log all of our diagnostics and debug info to file. We have no need for really "prettifying" it and TBH, we don't even often need it, so we just log and archive for the most part.

I would say if the user is doing anything with it, then log to DB, if its for you, then a file will probably suffice.

like image 20
Rob Cooper Avatar answered Dec 14 '22 23:12

Rob Cooper


Not sure if it helps, but there's also a utility called Microsoft LogParser that you can supposedly use to parse text-based log files and use them as if they were a database. From the website:

Log parser is a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files, as well as key data sources on the Windows® operating system such as the Event Log, the Registry, the file system, and Active Directory®. You tell Log Parser what information you need and how you want it processed. The results of your query can be custom-formatted in text based output, or they can be persisted to more specialty targets like SQL, SYSLOG, or a chart. Most software is designed to accomplish a limited number of specific tasks. Log Parser is different... the number of ways it can be used is limited only by the needs and imagination of the user. The world is your database with Log Parser.

I haven't used the program myself, but it seems quite interesting!

like image 27
onnodb Avatar answered Dec 14 '22 23:12

onnodb


Or how about logging to a queue? That way you can switch out pollers whenever you like to log to different things. It makes things like rolling over and archiving log files very easy. It's also nice because you can add pollers that log to different things, for example:

  • a poller that looks for error messages and posts them to your FogBugz account
  • a poller that looks for access violations ('x tried to access /foo/y/bar.html') to a 'hacking attempts' file
  • etc.
like image 41
James A. Rosen Avatar answered Dec 15 '22 00:12

James A. Rosen


Database - since you mentioned multiple threads. Synchronization as well as filtered retrieval are my reasons for my answer.
See if you have a performance problem before deciding to switch to files
"Knuth: Premature optimization is the root of all evil" I didn't get any further in that book... :)

like image 31
Gishu Avatar answered Dec 14 '22 23:12

Gishu


There are ways you can work around the limitations of file logging.

You can always start each log entry with a thread id of some kind, and grep out the individual thread ids. Or a different log file for each thread.

I've logged to database in the past, in a separate thread at a lower priority. I must say, queryability is very valuable when you're trying to figure out what went wrong.

like image 26
Josh Avatar answered Dec 15 '22 00:12

Josh