I was looking over the Linux loopback and IP network data handling, and it seems that there is no code to cover the case where 2 CPUs on different sockets are passing data via the loopback.
I think it should be possible to detect this condition and then apply hardware DMA when available to avoid NUMA contention to copy the data to the receiver.
My questions are:
There are several projects/attempts to add interfaces to memory-to-memory DMA Engines intended for use in HPS (mpi):
process_vm_readv
, process_vm_writev
: http://man7.org/linux/man-pages/man2/process_vm_readv.2.html
KNEM may use I/OAT Intel DMA engine on some microarchitectures and sizes
I/OAT copy offload through DMA Engine One interesting asynchronous feature is certainly I/OAT copy offload.
icopy.flags = KNEM_FLAG_DMA;
Some authors say that it have no benefits of hardware DMA Engine on newer Intel microarchitectures:
http://www.ipdps.org/ipdps2010/ipdps2010-slides/CAC/slides_cac_Mor10OptMPICom.pdf
I/OAT only useful for obsolete architectures
CMA was announced as similar project to knem: http://www.open-mpi.org/community/lists/devel/2012/01/10208.php
These system calls were designed to permit fast message passing by allowing messages to be exchanged with a single copy operation (rather than the double copy that would be required when using, for example, shared memory or pipes).
If you can, you should not use sockets (especially tcp sockets) to transfer data, they have high software overhead which is not needed when you are working on single machine. Standard skb
size limit may be too small to use I/OAT effectively, so network stack probably will not use I/OAT.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With