We are having a multithreaded application which has heavy packet processing across multiple pipeline stages. The application is in C under Linux.
The entire application works fine and has no memory leaks or thread saftey issues. However, in order to analyse the application, how can we profile and analyse the threads?
In particular here is what we are interested in:
What are the best techniques and tools available for the same?
A multi-threaded application is an application whose architecture takes advantage of the multi-threading provided by the operating system. Usually, these applications assign specific jobs to individual threads within the process and the threads communicate, through various means, to synchronize their actions.
Multithreading is the ability of a program or an operating system to enable more than one user at a time without requiring multiple copies of the program running on the computer. Multithreading can also handle multiple requests from the same user.
On a multiprocessor system, multiple threads can concurrently run on multiple CPUs. Therefore, multithreaded programs can run much faster than on a uniprocessor system. They can also be faster than a program using multiple processes, because threads require fewer resources and generate less overhead.
Take a look at at Intel VTune Amplifier XE (formerly … Intel Thread Profiler) to see if it will meet your needs. This and other Intel Linux development tools are available free for non-commercial use.
In the video Using the Timeline in Intel VTune Amplifier XE a timeline of a multi-threaded application is demonstrated. The presenter uses a graphic display to show lock activity and how to dig down to the source line of the particular lock causing serialization. At 9:20 the presenter mentions "with the frame API you can programmatically mark certain events or phases in your code. And these marks will appear on the timeline."
I worked on a similar system some years ago. Here's how I did it:
Step 1. Get rid of unnecessary time-takers in individual threads. For that I used this technique. This is important to do because the overall messaging system is limited by the speed of its parts.
Step 2. This part is hard work but it pays off. For each thread, print a time-stamped log showing when each message was sent, received, and acted upon. Then merge the logs into a common timeline and study it. What you are looking for is a) unnecessary retransmissions, for example due to timeouts, b) extra delay between the time a message is received and when it is acted upon. This can happen, for example, if a thread has multiple messages in its input queue, some of which can be processed more quickly than others. It makes sense to process those first.
You may need to alternate between these two.
Don't expect this to be easy. Some programmers are too fine to be bothered with this kind of dirty work. But, you could be pleasantly surprised at how fast you can make the whole thing go.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With