Okay, here's the scenario. I have a utility that processes tons of records, and enters information to the Database accordingly. It works on these records in multi-threaded batches. Each such batch writes to the same log file for creating a workflow trace for each record. Potentially, we could be making close to a million log writes in a day. Should this log be made into a database residing on another server? Considerations: <ol> <li>The obvious disadvantage of multiple threads writing to the same log file is that the log messages are shuffled amongst each other. In the database, they can be grouped by batch id.</li> <li>Performance - which would slow down the batch processing more? writing to a local file or sending log data to a database on another server on the same network. Theoretically, the log file is faster, but is there a gotcha here?</li> </ol> Are there any optimizations that can be done on either approach? Thanks.

I second the other answers here, depends on what you are doing with the data. We have two scenarios here: <ol> <li>The majority of the logging is to a DB since admin users for the products we build need to be able to view them in their nice little app with all the bells and whistles.</li> <li>We log all of our diagnostics and debug info to file. We have no need for really "prettifying" it and TBH, we don't even often need it, so we just log and archive for the most part.</li> </ol> I would say if the user is doing anything with it, then log to DB, if its for you, then a file will probably suffice.

Database - since you mentioned multiple threads. Synchronization as well as filtered retrieval are my reasons for my answer. See if you have a performance problem before deciding to switch to files "Knuth: Premature optimization is the root of all evil" I didn't get any further in that book... :)

There are ways you can work around the limitations of file logging. You can always start each log entry with a thread id of some kind, and grep out the individual thread ids. Or a different log file for each thread. I've logged to database in the past, in a separate thread at a lower priority. I must say, queryability is very valuable when you're trying to figure out what went wrong.

Which is a better approach in logging - files or DB?

Tags:

logging

multithreading

Okay, here's the scenario. I have a utility that processes tons of records, and enters information to the Database accordingly.

It works on these records in multi-threaded batches. Each such batch writes to the same log file for creating a workflow trace for each record. Potentially, we could be making close to a million log writes in a day.

Should this log be made into a database residing on another server? Considerations:

The obvious disadvantage of multiple threads writing to the same log file is that the log messages are shuffled amongst each other. In the database, they can be grouped by batch id.
Performance - which would slow down the batch processing more? writing to a local file or sending log data to a database on another server on the same network. Theoretically, the log file is faster, but is there a gotcha here?

Are there any optimizations that can be done on either approach?

Thanks.

986

asked Aug 27 '08 06:08

Vaibhav

7 Answers

The interesting question, should you decide to log to the database, is where do you log database connection errors?

If I'm logging to a database, I always have a secondary log location (file, event log, etc) in case there are communication errors. It really does make it easier to diagnose issues later on.

171

answered Dec 14 '22 22:12

ZombieSheep

One thing that comes to mind is that you could have each thread writing to its own log file and then do a daily batch run to combine them.

If you are logging to database you probably need to do some tuning and optimization, especially if the DB will be across the network. At the least you will need to be reusing the DB connections.

Furthermore, do you have any specific needs to have the log in database? If all you need is a "grep " then I don't think you gain much by logging into database.

answered Dec 14 '22 22:12

Rowan

I second the other answers here, depends on what you are doing with the data.

We have two scenarios here:

The majority of the logging is to a DB since admin users for the products we build need to be able to view them in their nice little app with all the bells and whistles.
We log all of our diagnostics and debug info to file. We have no need for really "prettifying" it and TBH, we don't even often need it, so we just log and archive for the most part.

I would say if the user is doing anything with it, then log to DB, if its for you, then a file will probably suffice.

answered Dec 14 '22 23:12

Rob Cooper

Not sure if it helps, but there's also a utility called Microsoft LogParser that you can supposedly use to parse text-based log files and use them as if they were a database. From the website:

Log parser is a powerful, versatile tool that provides universal query access to text-based data such as log files, XML files and CSV files, as well as key data sources on the Windows® operating system such as the Event Log, the Registry, the file system, and Active Directory®. You tell Log Parser what information you need and how you want it processed. The results of your query can be custom-formatted in text based output, or they can be persisted to more specialty targets like SQL, SYSLOG, or a chart. Most software is designed to accomplish a limited number of specific tasks. Log Parser is different... the number of ways it can be used is limited only by the needs and imagination of the user. The world is your database with Log Parser.

I haven't used the program myself, but it seems quite interesting!

answered Dec 14 '22 23:12

onnodb

Or how about logging to a queue? That way you can switch out pollers whenever you like to log to different things. It makes things like rolling over and archiving log files very easy. It's also nice because you can add pollers that log to different things, for example:

a poller that looks for error messages and posts them to your FogBugz account
a poller that looks for access violations ('x tried to access /foo/y/bar.html') to a 'hacking attempts' file
etc.

answered Dec 15 '22 00:12

James A. Rosen

Database - since you mentioned multiple threads. Synchronization as well as filtered retrieval are my reasons for my answer.
See if you have a performance problem before deciding to switch to files
"Knuth: Premature optimization is the root of all evil" I didn't get any further in that book... :)

answered Dec 14 '22 23:12

Gishu

There are ways you can work around the limitations of file logging.

You can always start each log entry with a thread id of some kind, and grep out the individual thread ids. Or a different log file for each thread.

I've logged to database in the past, in a separate thread at a lower priority. I must say, queryability is very valuable when you're trying to figure out what went wrong.

answered Dec 15 '22 00:12

Josh

Related questions
                            
                                C# Threads.Abort()
                            
                                Is this Singleton implementation correct and thread-safe?
                            
                                New form on a different thread
                            
                                Qt moveToThread() vs calling new thread when do we use each
                            
                                When wil the new Thread() without reference be garbage collected
                            
                                One reader. One writer. Some general questions about mutexes and atomic-builtins
                            
                                Java HashMap race condition
                            
                                Writing in a file from multiple threads
                            
                                Java : What happens if a Runnable that is being used in a thread is set to null?
                            
                                Long-running / blocking operations in boost asio handlers
                            
                                Multithreading in C# sqlite
                            
                                Multiprocess multiple files in a list
                            
                                Synchronized keyword internal implementation
                            
                                End Java threads after a while statement has been run
                            
                                Process list of 'N' items with multiple threads
                            
                                why does this script not work with threading python
                            
                                Performance decreases with a higher number of threads (no synchronization)
                            
                                PyQt5: object has no attribute 'connect'
                            
                                How does amphp work
                            
                                Analyzing Multithreaded Programs [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Which is a better approach in logging - files or DB?

Tags:

logging

multithreading

Vaibhav

People also ask

7 Answers

ZombieSheep

Rowan

Rob Cooper

onnodb

James A. Rosen

Gishu

Josh

Recent Activity

Donate For Us