Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get threads to stop blocking each other from writing to log file on disk?

My threads have fallen behind schedule and a thread dump reveals they're all stuck in blocking IO writing log output to hard disk. My quick fix is just to reduce logging, which is easy enough to do with respect to my QA requirements. Of course, this isn't vertically scalable which will be a problem soon enough.

I thought about just increasing the thread count but I'm guessing the bottleneck is on file contention and this could be rather bad if it's the wrong thing to do.

I have a lot of ideas but really no idea which are fruitful.

  1. I thought about increasing the thread count but I'm guessing they're bottlenecked so this won't do anything. Is this correct? How to determine it? Could decreasing the threadcount help?
  2. How do I profile the right # of threads to be writing to disk? Is this a function of number of write requests, number of bytes written per second, number of bytes per write op, what else?
  3. Can I toggle a lower-level setting (filesystem, OS, etc.) to reduce locking on a file in exchange for out-of-order lines being possible? Either in my Java application or lower level?
  4. Can I profile my system or hard disk to ensure it's not somehow being overworked? (Vague, but I'm out of my domain here).

So my question is: how to profile to determine the right number of threads that can safely write to a common file? What variables determine this - number of write operations, number of bytes written per second, number of bytes per write request, any OS or hard disk information.

Also is there any way to make the log file more liberal to be written to? We timestamp everything so I'm okay with a minority of out-of-order lines if it reduces blocking.

like image 381
djechlin Avatar asked Feb 16 '23 14:02

djechlin


1 Answers

My threads have fallen behind schedule and a thread dump reveals they're all stuck in blocking IO writing log output to hard disk.

Typically in these situations, I schedule a thread just for logging. Most logging classes (such as PrintStream) are synchronized and write/flush each line of output. By moving to a central logging thread and using some sort of BlockingQueue to queue up log messages to be written, you can make use of a BufferedWriter or some such to limit the individual IO requests. The default buffer size is 8k but you should increase that size. You'll need to make sure that you properly close the stream when your application shuts down.

With a buffered writer, you could additionally write through a GZIPOutputStream which would significantly lower your IO requirements if your log messages repeat a lot.

That said, if you are outputting too much debugging information, you may be SOL and need to either decrease your logging bandwidth or increase your disk IO speeds. Once you've optimized your application, the next steps include moving to SSD on your log server to handle the load. You could also try distributing the log messages to multiple servers to be persisted but a local SSD would most likely be faster.

To simulate the benefits of a SSD, a local RAM disk should give you a pretty good idea about increased IO bandwidth.

I thought about increasing the thread count but I'm guessing they're bottlenecked so this won't do anything. Is this correct?

If all your threads are blocked in IO, then yes, increasing the thread count will not help.

How do I profile the right # of threads to be writing to disk?

Tough question. You are going to have to do some test runs. See the throughput of your application with 10 threads, with 20, etc.. You are trying to maximize your overall transactions processed in some time. Make sure your test runs execute for a couple of minutes for best results. But, it is important to realize that a single thread can easily swamp a disk or network IO stream if it is spewing too much output.

Can I toggle a lower-level setting (filesystem, OS, etc.) to reduce locking on a file in exchange for out-of-order lines being possible? Either in my Java application or lower level?

No. See my buffered thread writer above. This is not about file locking which (I assume) is not happening. This is about number of IO requests/second.

Can I profile my system or hard disk to ensure it's not somehow being overworked? (Vague, but I'm out of my domain here).

If you are IO bound then the IO is slowing you down so it is "overworked". Moving to a SSD or RAM disk is an easy test to see if your application runs faster.

like image 52
Gray Avatar answered May 06 '23 17:05

Gray