Profiling distributed systems

Question

I was wondering about possible ways to track down performance bottlenecks in distributed systems. I am aware of tools like X-Trace and its offspring (e.g. Dapper) but I am more curious about the methodology rather than specific tools.

In other words, given a distributed system without any obvious bottlenecks, how do you study and improve its performance?

Mike Dunlavey · Accepted Answer

I've used a method that has a pro, and a con. The pro is that it works - it finds problems that, when they are fixed, result in nice snappy performance. The con is that it's a good amount of manual work.

I even wrote a book, and included the method. The work is to collect time-stamped event logs and merge them together into a common timeline. Then you carefully examine it, tracing the flow of related messages through the network of asynchronous agents. What you are looking for are needless message cycles, or delays that don't necessarily have to happen. For example, in looking at this picture, receipt of a message is being delayed due to the task "post status to DB". When that is understood, the posting could actually be done on a separate thread.

enter image description here

Profiling distributed systems

Tags:

performance

profiling

distributed-computing

hpc

distributed

redblackbit

1 Answers

Mike Dunlavey

Recent Activity

Donate For Us

Profiling distributed systems

Tags:

performance

profiling

distributed-computing

hpc

distributed

redblackbit

1 Answers

Mike Dunlavey

Related questions

Recent Activity

Donate For Us