Is there a linux DMA mem-to-mem copy mechanism available to userspace?
I have a linux application that routinely (50-100 times a second) has to memcpy several megs (10+) of data around. Often it's not an issue, but we've begun to see evidence that it may be consuming too much of our CPU bandwidth. Current measurements put it at something like 1Gbytes/s we're moving around.
I'm aware of the dma capability in the kernel, and I see a bit of documentation talking about building custom drivers for large memory copies, for this very reason.. But it seems someone would have build a generic API for this by now. Am I wrong? Is DMA a kernel-only feature?
I should clarify, this is for Intel X86 architecture, not embedded.
The DMA is used to transfer ten words (32-bit) of data from one memory location to another without any CPU load. The transfer of data is triggered by SW. The source is the Data Scratch Pad. SRAM of CPU0 (DSPR0) and the destination is the Local Memory Unit. (LMURAM).
DMA was introduced to allow devices to directly access system memory without interrupting the processor. In this model, an additional device (called a DMA engine) would handle the details of memory transfers.
DMA essentially freezes the CPU, disconnecting it from the memory and I/O busses, so that specialized data-moving hardware can transfer data between memory and peripherals.
DMA (Direct memory access) is an alternative method of communication to I/O ports that permits the device to transfer data directly, without the CPU's attention. The system can request that the data be fetched into a particular memory region and continue with other tasks until the data is ready.
Linux's API for DMA doesn't permit memory to memory transfers. It's only for communication between devices and memory. Look in Documentation/DMA-API.txt
for more details.
At hardware level, the x86 DMA controller doesn't allow memory to memory transfers. It's been discussed here: DMA transfer RAM-to-RAM
Given that the memory bus is usually slower than the CPU, what benefit would it have to launch a kernel driven memory copy ? You'd still have to wait for the transfer to finish and its duration would still be the determined by the memory bandwidth, exactly as with a CPU driven copy.
If your program's performance solely depends on memory to memory copy performance, it means that it can be probably be strongly improved by avoiding copy as much as possible, or by implementing a smarter procedure such as copy on write.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With