Linux PCIe DMA Driver (Xilinx XDMA)

Tags:

I am currently working with the Xilinx XDMA driver (see here for source code: XDMA Source), and am attempting to get it to run (before you ask: I have contacted my technical support point of contact and the Xilinx forum is riddled with people having the same issue). However, I may have found a snag in Xilinx's code that might be a deal breaker for me. I am hoping there is something that I'm not considering.

First off, there are two primary modes of the driver, AXI-Memory Mapped (AXI-MM) and AXI-Streaming (AXI-ST). For my particular application, I require AXI-ST, since data will continuously be flowing from the device.

The driver is written to take advantage of scatter-gather lists. In AXI-MM mode, this works because reads are rather random events (i.e., there isn't a flow of data out of the device, instead the userspace application simply requests data when it needs to). As such, the DMA transfer is built up, the data is transfered, and the transfer is then torn down. This is a combination of get_user_pages(), pci_map_sg(), and pci_unmap_sg().

For AXI-ST, things get weird, and the source code is far from orthodox. The driver allocates a circular buffer where the data is meant to continuously flow into. This buffer is generally sized to be somewhat large (mine is set on the order of 32MB), since you want to be able to handle transient events where the userspace application forgot about the driver and can then later work off the incoming data.

Here's where things get wonky... the circular buffer is allocated using vmalloc32() and the pages from that allocation are mapped in the same way as the userspace buffer is in AXI-MM mode (i.e., using the pci_map_sg() interface). As a result, because the circular buffer is shared between the device and CPU, every read() call requires me to call pci_dma_sync_sg_for_cpu() and pci_dma_sync_sg_for_device(), which absolutely destroys my performance (I can not keep up with the device!), since this works on the entire buffer. Funny enough, Xilinx never included these sync calls in their code, so I first knew I had a problem when I edited their test script to attempt more than one DMA transfer before exiting and the resulting data buffer was corrupted.

As a result, I'm wondering how I can fix this. I've considered rewriting the code to build up my own buffer allocated using pci_alloc_consistent()/dma_alloc_coherent(), but this is easier said than done. Namely, the code is architected to assume using scatter-gather lists everywhere (there appears to be a strange, proprietary mapping between the scatter-gather list and the memory descriptors that the FPGA understands).

Are there any other API calls I should be made aware of? Can I use the "single" variants (i.e., pci dma_sync_single_for_cpu()) via some translation mechanism to not sync the entire buffer? Alternatively, is there perhaps some function that can make the circular buffer allocated with vmalloc() coherent?

471

asked Feb 16 '18 08:02

It'sPete

1 Answers

Alright, I figured it out.

Basically, my assumptions and/or understanding of the kernel documentation regarding the sync API were totally incorrect. Namely, I was wrong on two key assumptions:

If the buffer is never written to by the CPU, you don't need to sync for the device. Removing this call doubled my read() throughput.
You don't need to sync the entire scatterlist. Instead, now in my read() call, I figure out what pages are going to be affected by the copy_to_user() call (i.e., what is going to be copied out of the circular buffer) and only sync those pages that I care about. Basically, I can call something like pci_dma_sync_sg_for_cpu(lro->pci_dev, &transfer->sgm->sgl[sgl_index], pages_to_sync, DMA_FROM_DEVICE) where sgl_index is where I figured the copy will start and pages_to_sync is how large the data is in number of pages.

With the above two changes my code now meets my throughput requirements.

164

answered Sep 25 '22 14:09

It'sPete

Related questions
                            
                                Docker Bash Has No Colour
                            
                                Apache 2.4.23 undefined reference to CRYPTO_malloc_init?
                            
                                Redirecting output of a program to a rotating file
                            
                                Isn't 07C0:0000, the same physical address on x86 machines as 0000:7C00?
                            
                                Why qemu failled to create private network with private virtual bridge? I got "network script /etc/qemu-ifup failed with status 256""
                            
                                Difference between a Daemon process and an orphan process?
                            
                                Python multiprocessing linux windows difference
                            
                                How can I filter on Windows-based or Linux-based containers within Docker Hub?
                            
                                Linux run kernel probe systemtap script failed with semantic error: no match"
                            
                                systemctl disable name.service doesn't persist after reboot
                            
                                Why do you need superuser permissions to read the real-time clock on Linux?
                            
                                Question marks in output of command "ls -la" in Ubuntu
                            
                                how can i redirect the output of cppcheck into file?
                            
                                PHP/Ubuntu - QxcbConnection: Could not connect to display aborted
                            
                                jarsigner error: java.time.DateTimeException: Invalid value for MonthOfYear (valid values 1 - 12): 0
                            
                                How to convert 'ls' command to 'cat' command?
                            
                                Linux, UDP datagrams, and kernel timestamps: Lots of examples and stackoversflow entries later, and still cannot get timestamps at all
                            
                                How to run the django web server even after closing the shell on Amazon Linux
                            
                                Are C standard library structures compatible between compilers and library versions on macOS or Linux?
                            
                                glibc application holding onto unused memory until just before exit

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Linux PCIe DMA Driver (Xilinx XDMA)

Tags:

linux

driver

fpga

pci-e

xilinx

It'sPete

People also ask

1 Answers

It'sPete

Recent Activity

Donate For Us