Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Profiling distributed systems

I was wondering about possible ways to track down performance bottlenecks in distributed systems. I am aware of tools like X-Trace and its offspring (e.g. Dapper) but I am more curious about the methodology rather than specific tools.

In other words, given a distributed system without any obvious bottlenecks, how do you study and improve its performance?

like image 695
redblackbit Avatar asked Oct 21 '22 12:10

redblackbit


1 Answers

I've used a method that has a pro, and a con. The pro is that it works - it finds problems that, when they are fixed, result in nice snappy performance. The con is that it's a good amount of manual work.

I even wrote a book, and included the method. The work is to collect time-stamped event logs and merge them together into a common timeline. Then you carefully examine it, tracing the flow of related messages through the network of asynchronous agents. What you are looking for are needless message cycles, or delays that don't necessarily have to happen. For example, in looking at this picture, receipt of a message is being delayed due to the task "post status to DB". When that is understood, the posting could actually be done on a separate thread.

enter image description here

like image 155
Mike Dunlavey Avatar answered Oct 27 '22 20:10

Mike Dunlavey