I am measuring the performance of InfiniBand using iperf.
It's a one-to-one connection between a server and a client.
I measured the bandwidth changing number of threads which request Network I/Os.
( The cluster server has:
Bandwidth:
1 thread : 1.34 GB/sec,
2 threads : 1.55 GB/sec ~ 1.75 GB/sec,
4 threads : 2.38 GB/sec,
8 threads : 2.03 GB/sec,
16 threads : 2.00 GB/sec,
32 threads : 1.83 GB/sec.
As you see above, Bandwidth goes up until 4 threads and decreases after it.
Could you give me some ideas in understanding what's happening there?
Additionally what happens once many machines send data to one machine? (contention)
Can InfiniBand handle that too?
There are alot of things going under the covers here. But one of the biggest bottlenecks in infiniband is the QP cache in the firmware.
The firmware has a very very small QP cache (of the order of 16 - 32) depending upon which adaptor you are using. When the number of active Qps exceeds this cache, then any benefit of using IB starts to degenerate. From what I know, the performance penalty for a cache miss is of the order of mili seconds.. yes thats right.. milliseconds..
There are many other caches involved.
Ib has multiple different transports, with 2 most common being: 1. RC - Reliable Connected 2. UD - Unreliable Datagram
Reliable Connected mode is somewhat like TCP in that it requires an explicit connection, and is point 2 point between 2 processes. Each process allocates a QP (Queue Pair) which is similar to a socket in the ethernet world. But QP is a much more expensive and resource than a socket for many different reasons.
UD : unreliable datagram mode is like UDP in that it does not need a connection. A sing UD Qp can talk to any number of remote UD Qps.
If your data model is 1 to many.. i.e 1 machine to many machines and you need a reliable connection with huge data sizes, then you are out of luck. IB starts losing some of its effectiveness.
If you have the resources to build a reliable layer on top, then use UD for getting scalability.
If you data model is 1 to many, but the many remote processes reside on the same machine, then you can use RDS (reliable Datagram service) which is a Socket interface to use Infiniband and multiplexes many connections over a single RC connections between 2 machines. (RDS has its own set of weird issues but its a start..)
There is a 3rd newish transport called XRC which mitigates some scalability issues as well but has its own caveats.
Since iperf uses TCP, it will not get all the bandwidth possible with native Infiniband.
How many cores does your CPU have? Once the number of threads exceeds the number of cores, threads get time slices to run serially on the same cores, instead of running in parallel. They start getting in each others way.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With