Okay, here's the scenario. I have a utility that processes tons of records, and enters information to the Database accordingly.
It works on these records in multi-threaded batches. Each such batch writes to the same log file for creating a workflow trace for each record. Potentially, we could be making close to a million log writes in a day.
Should this log be made into a database residing on another server? Considerations:
Are there any optimizations that can be done on either approach?
Thanks.
In hindsight, a better answer is to log to BOTH file system (first, immediately) and then to a centralized database (even if delayed). Most modern logging frameworks follow a publish-subscribe model (often called logging sources and sinks) which will allow multiple logging sinks (targets) to be defined.
Advantage of the File System over Data base Management System is: When handling small data sets with arbitrary, probably unrelated data, file is more efficient than database. For simple operations, read, write, file operations are faster and simple. You can find n number of difference over internet.
If you are only logging lots and lots of simple log messages, MongoDB is a very good choice as it scales so good.
As a general rule, databases are slower than files.
The interesting question, should you decide to log to the database, is where do you log database connection errors?
If I'm logging to a database, I always have a secondary log location (file, event log, etc) in case there are communication errors. It really does make it easier to diagnose issues later on.
One thing that comes to mind is that you could have each thread writing to its own log file and then do a daily batch run to combine them.
If you are logging to database you probably need to do some tuning and optimization, especially if the DB will be across the network. At the least you will need to be reusing the DB connections.
Furthermore, do you have any specific needs to have the log in database? If all you need is a "grep " then I don't think you gain much by logging into database.
I second the other answers here, depends on what you are doing with the data.
We have two scenarios here:
The majority of the logging is to a DB since admin users for the products we build need to be able to view them in their nice little app with all the bells and whistles.
We log all of our diagnostics and debug info to file. We have no need for really "prettifying" it and TBH, we don't even often need it, so we just log and archive for the most part.
I would say if the user is doing anything with it, then log to DB, if its for you, then a file will probably suffice.
Not sure if it helps, but there's also a utility called Microsoft LogParser that you can supposedly use to parse text-based log files and use them as if they were a database. From the website:
Log parser is a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files, as well as key data sources on the Windows® operating system such as the Event Log, the Registry, the file system, and Active Directory®. You tell Log Parser what information you need and how you want it processed. The results of your query can be custom-formatted in text based output, or they can be persisted to more specialty targets like SQL, SYSLOG, or a chart. Most software is designed to accomplish a limited number of specific tasks. Log Parser is different... the number of ways it can be used is limited only by the needs and imagination of the user. The world is your database with Log Parser.
I haven't used the program myself, but it seems quite interesting!
Or how about logging to a queue? That way you can switch out pollers whenever you like to log to different things. It makes things like rolling over and archiving log files very easy. It's also nice because you can add pollers that log to different things, for example:
Database - since you mentioned multiple threads. Synchronization as well as filtered retrieval are my reasons for my answer.
See if you have a performance problem before deciding to switch to files
"Knuth: Premature optimization is the root of all evil" I didn't get any further in that book... :)
There are ways you can work around the limitations of file logging.
You can always start each log entry with a thread id of some kind, and grep out the individual thread ids. Or a different log file for each thread.
I've logged to database in the past, in a separate thread at a lower priority. I must say, queryability is very valuable when you're trying to figure out what went wrong.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With