Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you measure latency in low-latency environments?

Here's the setup... Your system is receiving a stream of data that contains discrete messages (usually between 32-128 bytes per message). As part of your processing pipeline, each message passes through two physically separate applications which exchange the data using a low-latency approach (such as messaging over UDP) or RDMA and finally to a client via the same mechanism.

Assuming you can inject yourself at any level, including wire protocol analysis, what tools and/or techniques would you use to measure the latency of your system. As part of this, I'm assuming that every message that is delivered to the system results in a corresponding (though not equivalent) message being pushed through the system and delivered to the client.

The only tool that I've seen on the market like this is TS-Associates TipOff. I'm sure that with the right access you could probably measure the same information using a wire analysis tool (ala wireshark) and the right dissectors, but is this the right approach or are there any commodity solutions that I can use?

like image 481
Ajaxx Avatar asked Aug 05 '09 21:08

Ajaxx


People also ask

How do you test for low latency?

A simple test to measure latency is to run a ping. This is a network diagnostic tool primarily used to test connectivity between two servers or devices. To ping a destination server, an Internet Control Message Protocol (ICMP) echo request packet is sent to that server.

How do you measure latency?

The most accurate way to measure latency is by using a Network Monitoring Software, like Obkio. Obkio measures jitter using continuous synthetic traffic from Network Monitoring Agents deployed in your most important network locations like offices, data centers and clouds.

What is latency and how is it calculated?

4. Network Latency. Network latency ( ) is the sum of all possible delays a packet can face during data transmission. We generally express network latency as round trip time (RTT) and measure in milliseconds (ms). Network delay includes processing, queuing, transmission, and propagation delays.

What is considered low latency?

A lower latency network connection is one that experiences very small delay times. Latency is the amount of time a message takes to traverse a computer network. It is typically measured in milliseconds. Any latency below 100 milliseconds (ms) is considered good, and below 50 ms is very good.


2 Answers

Your last paragraph is the typical way it needs to be done. The usual suspects in this field (at least as far as I know for market data (wall street) latency) are:

  • TSA (TS Associates)
  • Correlix
  • Corvil
  • Napatech (hardware capture devices)
  • Endace (hardware capture devices)

There was another badly run company that recently burned through their VC money (4 million?).

For data that is processed (let's say at a direct exchange feed or RMDS or other server that changes the protocol) into different formats you need to be able to parse the payloads to correlate the messages. It can be challenging since sometimes data vendors do not expose the message definitions.

I think there are hardware devices that will inject payload information with timestamps in it so the client can see these. Of course, as another poster pointed out - the question of time is very important. All the devices and clients have to have the same reference point for time. It has to be accurate...

The last time I spoke with TSA, an installation with 4 observation points was on the order of $150k. I suspect that the others listed above are similar in price.

The hardware cards listed above start around $2k (for a bare bones card) and go up (significantly) from there.

To do it in software you'd need to have clients using pcap (or something similar) and look at the payloads and try to match them up. In some cases it is difficult to get this to be deterministic - especially at the start of a "session" or if messages are missing from one pipe. Usually after some threshold if you don't match something, you just drop it.

EDIT: DISCLAIMER: I am also part of the venture now and should disclose that.

like image 135
Tim Avatar answered Nov 10 '22 00:11

Tim


A recent paper might be of some use (and would also be much cheaper than hardware-based solutions). There are also ways of fairly accurately accounting for clock skew; the last time I seriously looked into one-way latency measurement research (a couple years ago), the most accurate technique was a linear programming algorithm by Sue Moon (with reference code conveniently available here), but without using some rather modern linear programming techniques, it's fairly impractical to do as an online algorithm; it's best just to record timestamps without doing any calculations periodically throughout the day, and then run the LP algorithm on the accumulated data afterwards. There were a few other techniques that were quick enough to be done on-line (including the seminal paper by Vern Paxson), but they were all much less accurate.

like image 23
strangelydim Avatar answered Nov 10 '22 00:11

strangelydim