RDMA memory sharing

Tags:

I have a few multi-core computers connected by Infiniband network. I would like to have some low-latency computation on a pool of shared memory, with remote atomic operations. I know RDMA is the way to go. On each node I would register a memory region (and protection domain) for data sharing.

The online RDMA examples often focus at a single connection between a single-threaded server and a single-threaded client. Now I would like to have a multi-threaded process on each of the Infiniband node. I am very puzzled about the following...

How many queue pairs should I prepare on each node, for a cluster of n nodes and m threads in total? To be more specific, can multiple threads on the same node share the same queue pair?
How many completion queues should I prepare on each node? I will have multiple threads issuing remote read/write/cas operations on each node. If they were to share a common completion queue, the completion events will be mixed up. If the threads have their own separated completion queues, there would be really a lot of them.
Do you suggest me to have any existing libraries instead of writing this software? (hmm, or I should write one and open-source it? :-)

Thank you for your kind suggestion(s).

362

asked Feb 27 '12 18:02

Kinson Chan

1 Answers

On Linux at least, the InfiniBand verbs library is completely thread-safe. So you can use as many or as few queue pairs (QPs) in your multi-threaded app as you want -- multiple threads can post work requests to a single QP safely, although of course you will have to make sure whatever tracking of outstanding requests, etc. that you do in your own application is thread-safe.

It is true that each send queue and each receive queue (remember that QP is really a pair of queues :) is attached to a single completion queue (CQ). So if you want each thread to have its own CQ then each thread will need its own QP to submit work into.

In general QPs and CQs are not really a limited resource -- you can easily have hundreds or thousands on a single node without trouble. So you can design your app without worrying too much about the absolute number of queues you're using. This is not to say you don't have to worry about scalability -- for example if you have a lot of receive queues and a lot of buffers per queue, then you may tie up too much memory in receive buffering, so you end up needing to use shared receive queues (SRQs).

There are a number of middleware libraries that use IB; probably MPI (eg http://open-mpi.org/) is the best-known one, and it's probably worth evaluating that before you get too far into reinventing things. The MPI developers have also published a lot of research about using IB/RDMA efficiently, which is probably worth seeking out in case you do decide to build your own system.

answered Jan 03 '23 13:01

Roland

Related questions
                            
                                What is the maximum length of the cable can be for infiniband(RDMA)?
                            
                                Spark and InfiniBand
                            
                                What are the PCIe operations involved in Infiniband verbs?
                            
                                infiniband rdma poor transfer bw
                            
                                Infiniband addressing - host names to IB address without IBoIP
                            
                                Cannot create queue pair with ib_create_qp
                            
                                MPI_SEND takes huge part of virtual memory
                            
                                "Local" RDMA for development
                            
                                InfiniBand: transfer rate depends on MPI_Test* frequency
                            
                                How do I use an InfiniBand network with Dask?
                            
                                Packet capture in RDMA?
                            
                                GPUDirect RDMA transfer from GPU to remote host

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

RDMA memory sharing

Tags:

infiniband

rdma

Kinson Chan

People also ask

1 Answers

Roland

Recent Activity

Donate For Us