Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using memcpy and friends with memory-mapped I/O

I'm working on an embedded project which involves I/O on memory-mapped FPGA registers. Pointers to these memory regions need to be marked volatile so the compiler does not "optimize out" reads and writes to the FPGA by caching values in CPU registers.

In a few cases, we want to copy a series of FPGA registers into a buffer for further use. Since the registers are mapped to contiguous addresses, memcpy seems appropriate, but passing our volatile pointer as the source argument gives a warning about discarding the volatile qualifier.

Is it safe (and sane) to cast away the volatile-ness of the pointer to suppress this warning? Unless the compiler does something magical, I can't imagine a scenario where calling memcpy would fail to perform an actual copy. The alternative is to just use a for loop and copy byte by byte, but memcpy implementations can (and do) optimize the copy based on size of the copy, alignment, etc.

like image 473
Matt Kline Avatar asked Sep 28 '22 12:09

Matt Kline


1 Answers

As a developer of both: FPGA and embedded software, there is just one clear answer: do not use memcpy et al. for this

Some reasons:

  • There is no guarantee memcpy will work in any specific order.
  • The compiler might very well replace the call with inline code.
  • Such acceses often require a certain word-size. memcpy does not guarantee that.
  • Gaps in the register map might result in undefined behaviour.

You can, however, use a simple for loop and copy yourself. This is safe, if the registers are volatile (see below).

Depending on your platform, volatile alone might not be sufficient. The memory area has also to be non-cachable and strictily ordered (and - possibly - non-shared). Otherwise the system busses might (and will for some platforms) reorder accesses.

Furthermore, you might need barriers/fences for your CPU not to reorder accesses. Please read your hardware-specs very carefully about this.

If you need to transfer larger blocks more often, think about using DMA. If the FPGA uses PCI(e), you could use busmaster DMA with scatter/gather for instance (however, this is not easily implemented; did that myself, but might be worth the effort).

The best (and most sane) approach depends actually on multiple factors, like platform, required speed, etc. Of all possible approaches, I would deem using mempcy() one of the lesser sane(1) at best (1): not sure if that is correct grammar, but I hope you got my point).

like image 152
too honest for this site Avatar answered Sep 30 '22 04:09

too honest for this site