Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DMA transfer taking more time than CPU transfer

Tags:

c

memcpy

dma

stm32

Our task is intended to demonstrate the benefit of using DMA to copy a large amount of data versus relying on the processor to directly handle the copying. The processor is an STM32F407 on the ST discovery board.

In order to measure the copying time, a GPIO pin must be turned ON during copying and OFF once it has been copied.

The code appears to be functional but it is currently showing the CPU taking about 2.15ms to complete and DMA about 4.5ms, which is the opposite of what is intended. I'm not sure if there simply isn't enough data for the faster speed of DMA to offset the overhead in setting it up perhaps?

I have tried both copying elements of an array using the CPU and also using the memcpy function which seemed to yield very similar times.

The function code is shown below:

DMASpeed(void)
{
    #define elementNum 32000
    int *ptr = NULL;
    ptr = (int*)malloc(elementNum * sizeof(int));
    int *ptr2 = NULL;
    ptr2 = (int*)malloc(elementNum * sizeof(int));
    for (int i = 0; i < elementNum; i++)
    {
        ptr[i] = 4;
    }
    LD5_GPIO_Port->BSRR = (uint32_t)LD5_Pin << 16U;
    LD6_GPIO_Port->BSRR = (uint32_t)LD6_Pin << 16U;
    // Initial value
    // printf("BEFORE: dst = '%s'\n", dst);

    // Transfer
    printf("Initiate DMA Transfer...\n");
    HAL_DMA_Start(&hdma_memtomem_dma2_stream0, (int)ptr, (int)ptr2, (elementNum * sizeof(int)));
    LD5_GPIO_Port->BSRR = LD5_Pin;
    printf("DMA Transfer initiated.\n");


    // Poll for DMA completion
    printf("Poll for DMA completion.\n");
    HAL_DMA_PollForTransfer(&hdma_memtomem_dma2_stream0,
        HAL_DMA_FULL_TRANSFER, HAL_MAX_DELAY);
    LD5_GPIO_Port->BSRR = (uint32_t)LD5_Pin << 16U;
    printf("DMA complete.\n");

    // Print result
    // printf("AFTER: dst = '%s'\n", dst);
    free(ptr);
    free(ptr2);

    ptr = (int*)malloc(elementNum * sizeof(int));
    ptr2 = (int*)malloc(elementNum * sizeof(int));
    for (int i = 0; i < elementNum; i++)
    {
        ptr[i] = i;
    }

    printf("Initiate CPU Transfer...\n");
    LD6_GPIO_Port->BSRR = LD6_Pin;
    //  for (int i = 0; i<512; i++)
    //  {
    //  ptr2[i] = ptr[i];
    //  }
    memcpy(ptr2, ptr, (elementNum * sizeof(int)));
    printf("CPU Transfer Complete.\n");
    LD6_GPIO_Port->BSRR = (uint32_t)LD6_Pin << 16U;

    free(ptr);
    free(ptr2);
}

Thanks in advance for any assistance

like image 574
Joe P Avatar asked May 14 '19 04:05

Joe P


People also ask

How DMA is faster than CPU?

DMA engines allow the load of transferring data around to be taken off of the CPU and instead be run on dedicated circuitry. This allows the Cpu to do other things, which might not make the dma approach seem faster at first, until you consider that CPU time is shared between many different processes.

How does DMA slow down the processor?

DMA essentially freezes the CPU, disconnecting it from the memory and I/O busses, so that specialized data-moving hardware can transfer data between memory and peripherals.

Why DMA data transfer is faster than doing the same data transfer with program instruction?

The direct memory access or DMA mode of data transfer is faster amongst all the mode of data transfer . In this mode ,the device may transfer data directly to/from memory without any interference from the cpu .

Is DMA slow?

If the DMA controller in a system functions at a maximum rate of 5 MHz and we still use 100 ns memory, the maximum transfer rate is 5 MHz because the DMA controller is slower than the memory. In many cases, the DMA controller slows the speed of the system when DMA transfers occur.


2 Answers

you try to proof something what is not the true. DMA memory to memory transfer will be always slower than direct CPU one. DMA was not intended to be faster than the CPU. it's there is to provide the transfer w without the CPU activity in the background. the core has always priority over the DMA.

MEM to MEM DMA transfer will be always slower than the CPU one

There is another problem as well. Many STM devices have memory areas which are not accessible by the DMA (for example CCMRAM).

like image 148
0___________ Avatar answered Sep 24 '22 18:09

0___________


Remove printf in below code segment:

LD5_GPIO_Port->BSRR = LD5_Pin;
printf("DMA Transfer initiated.\n");  // <--Remove this


// Poll for DMA completion
printf("Poll for DMA completion.\n"); // <--Remove this

You are turning ON the pin and then printing large text , it is adding up in your total time calculation.

Remove all printf OR atleast do not print anything in between pin toggling.

EDIT:

To be precise you are printing 50 characters in case of DMA transfer and 23 characters in case of CPU transfer.

like image 39
Vagish Avatar answered Sep 23 '22 18:09

Vagish