Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimize socket data transfer over loopback wrt NUMA

I was looking over the Linux loopback and IP network data handling, and it seems that there is no code to cover the case where 2 CPUs on different sockets are passing data via the loopback.

I think it should be possible to detect this condition and then apply hardware DMA when available to avoid NUMA contention to copy the data to the receiver.

My questions are:

  • Am I correct that this is not currently done in Linux?
  • Is my thinking that this is possible on the right track?
  • What kernel APIs or existing drivers should I study to help complete such a version of the loopback?
like image 444
jxh Avatar asked Apr 23 '15 22:04

jxh


1 Answers

There are several projects/attempts to add interfaces to memory-to-memory DMA Engines intended for use in HPS (mpi):

  • KNEM kernel module - High-Performance Intra-Node MPI Communication - http://knem.gforge.inria.fr/
  • Cross Memory Attach (CMA) - New syscalls process_vm_readv, process_vm_writev: http://man7.org/linux/man-pages/man2/process_vm_readv.2.html

KNEM may use I/OAT Intel DMA engine on some microarchitectures and sizes

I/OAT copy offload through DMA Engine One interesting asynchronous feature is certainly I/OAT copy offload. icopy.flags = KNEM_FLAG_DMA;

Some authors say that it have no benefits of hardware DMA Engine on newer Intel microarchitectures:

http://www.ipdps.org/ipdps2010/ipdps2010-slides/CAC/slides_cac_Mor10OptMPICom.pdf

I/OAT only useful for obsolete architectures

CMA was announced as similar project to knem: http://www.open-mpi.org/community/lists/devel/2012/01/10208.php

These system calls were designed to permit fast message passing by allowing messages to be exchanged with a single copy operation (rather than the double copy that would be required when using, for example, shared memory or pipes).

If you can, you should not use sockets (especially tcp sockets) to transfer data, they have high software overhead which is not needed when you are working on single machine. Standard skb size limit may be too small to use I/OAT effectively, so network stack probably will not use I/OAT.

like image 75
osgx Avatar answered Nov 03 '22 01:11

osgx