Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory access after ioremap very slow

I'm working on a Linux kernel driver that makes a chunk of physical memory available to user space. I have a working version of the driver, but it's currently very slow. So, I've gone back a few steps and tried making a small, simple driver to recreate the problem.

I reserve the memory at boot time using the kernel parameter memmap=2G$1G. Then, in the driver's __init function, I ioremap some of this memory, and initialize it to a known value. I put in some code to measure the timing as well:

#define RESERVED_REGION_SIZE    (1 * 1024 * 1024 * 1024)   // 1GB
#define RESERVED_REGION_OFFSET  (1 * 1024 * 1024 * 1024)   // 1GB

static int __init memdrv_init(void)
{
    struct timeval t1, t2;
    printk(KERN_INFO "[memdriver] init\n");

    // Remap reserved physical memory (that we grabbed at boot time)
    do_gettimeofday( &t1 );
    reservedBlock = ioremap( RESERVED_REGION_OFFSET, RESERVED_REGION_SIZE );
    do_gettimeofday( &t2 );
    printk( KERN_ERR "[memdriver] ioremap() took %d usec\n", usec_diff( &t2, &t1 ) );

    // Set the memory to a known value
    do_gettimeofday( &t1 );
    memset( reservedBlock, 0xAB, RESERVED_REGION_SIZE );
    do_gettimeofday( &t2 );
    printk( KERN_ERR "[memdriver] memset() took %d usec\n", usec_diff( &t2, &t1 ) );

    // Register the character device
    ...

    return 0;
}

I load the driver, and check dmesg. It reports:

[memdriver] init
[memdriver] ioremap() took 76268 usec
[memdriver] memset() took 12622779 usec

That's 12.6 seconds for the memset. That means the memset is running at 81 MB/sec. Why on earth is it so slow?

This is kernel 2.6.34 on Fedora 13, and it's an x86_64 system.

EDIT:

The goal behind this scheme is to take a chunk of physical memory and make it available to both a PCI device (via the memory's bus/physical address) and a user space application (via a call to mmap, supported by the driver). The PCI device will then continually fill this memory with data, and the user-space app will read it out. If ioremap is a bad way to do this (as Ben suggested below), I'm open to other suggestions that'll allow me to get any large chunk of memory that can be directly accessed by both hardware and software. I can probably make do with a smaller buffer also.


See my eventual solution below.

like image 430
Dave Ceddia Avatar asked Dec 15 '10 16:12

Dave Ceddia


3 Answers

ioremap allocates uncacheable pages, as you'd desire for access to a memory-mapped-io device. That would explain your poor performance.

You probably want kmalloc or vmalloc. The usual reference materials will explain the capabilities of each.

like image 139
Eric Seppanen Avatar answered Nov 13 '22 00:11

Eric Seppanen


I don't think ioremap() is what you want there. You should only access the result (what you call reservedBlock) with readb, readl, writeb, memcpy_toio etc. It is not even guaranteed that the return is virtually mapped (although it apparently is on your platform). I'd guess that the region is being mapped uncached (suitable for IO registers) leading to the terrible performance.

like image 20
Ben Jackson Avatar answered Nov 12 '22 23:11

Ben Jackson


It's been a while, but I'm updating since I did eventually find a workaround for this ioremap problem.

Since we had custom hardware writing directly to the memory, it was probably more correct to mark it uncacheable, but it was unbearably slow and wasn't working for our application. Our solution was to only read from that memory (a ring buffer) once there was enough new data to fill a whole cache line on our architecture (I think that was 256 bytes). This guaranteed we never got stale data, and it was plenty fast.

like image 1
Dave Ceddia Avatar answered Nov 13 '22 00:11

Dave Ceddia