Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Minimizing Java Thread Context Switching Overhead

I have a Java application running on Sun 1.6 32-bit VM/Solaris 10 (x86)/Nahelem 8-core(2 threads per core).

A specific usecase in the application is to respond to some external message. In my performance test environment, when I prepare and send the response in the same thread that receives the external input, I get about 50 us advantage than when I hand off the message to a separate thread to send the response. I use a ThreadPoolExecutor with a SynchronousQueue to do the handoff.

In your experience what is the acceptable expected delay between scheduling a task to a thread pool and it getting picked up for execution? What ideas had worked for you in the past to try improve this?

like image 888
Binil Thomas Avatar asked May 28 '10 05:05

Binil Thomas


3 Answers

The "acceptable delay" entirely depends on your application. Dealing with everything on the same thread can indeed help if you've got very strict latency requirements. Fortunately most applications don't have requirements quite that strict.

Of course, if only one thread is able to receive requests, then tying up that thread for computing the response will mean you can't accept any other requests. Depending on what you're doing you can use asynchronous IO (etc) to avoid the "thread per request" model, but it's significantly harder IMO, and still ends up with thread context switching.

Sometimes it's appropriate to queue requests to avoid having too many threads processing them: if your handling is CPU-bound, it doesn't make much sense to have hundreds of threads - better to have a producer/consumer queue of tasks and distribute them at roughly one thread per core. That's basically what ThreadPoolExecutor will do if you set it up properly of course. That doesn't work as well if your requests spend a lot of their time waiting for external services (including disks, but primarily other network services)... at that point you either need to use asynchronous execution models whenever you would potentially make a core idle with a blocking call, or you take the thread context switching hit and have lots of threads, relying on the thread scheduler to make it work well enough.

The bottom line is that latency requirements can be tough - in my experience they're significantly tougher than throughput requirements, as they're much harder to scale out. It really does depend on the context though.

like image 148
Jon Skeet Avatar answered Nov 02 '22 11:11

Jon Skeet


50us sounds somewhat high for a handoff, IME (Solaris 10/Opteron) LBQ is typically in the 30-35us range while LTQ (LinkedTransferQueue) is about 5us faster than that. As stated in the other replies SynchronousQueue may tend to be slightly slower because the offer doesn't return until the other thread has taken.

According to my results Solaris 10 is markedly slower than Linux at this which sees times <10us.

It really depends on a few things, under peak load

  • how many requests per second are you servicing?
  • how long does it typically take to process a request?

If you know the answer to those Qs then it should be fairly clear, on performance grounds, whether you should handle in the receiving thread or handoff to a processing thread.

like image 40
Matt Avatar answered Nov 02 '22 12:11

Matt


Is there a reason why you don't use a LinkedBlockingQueue so your producer can queue up a couple of items instead of a SynchronousQueue? At the very least have a queue with 1 item in it so you can get better parallelism.

What is the speed of the "prepare" process versus the "response"? Can you use a thread pool to have multiple threads handling the responses if they are too expensive?

like image 41
Gray Avatar answered Nov 02 '22 12:11

Gray