Memory access after ioremap very slow

Question

I'm working on a Linux kernel driver that makes a chunk of physical memory available to user space. I have a working version of the driver, but it's currently very slow. So, I've gone back a few steps and tried making a small, simple driver to recreate the problem.

I reserve the memory at boot time using the kernel parameter memmap=2G$1G. Then, in the driver's __init function, I ioremap some of this memory, and initialize it to a known value. I put in some code to measure the timing as well:

#define RESERVED_REGION_SIZE    (1 * 1024 * 1024 * 1024)   // 1GB
#define RESERVED_REGION_OFFSET  (1 * 1024 * 1024 * 1024)   // 1GB

static int __init memdrv_init(void)
{
    struct timeval t1, t2;
    printk(KERN_INFO "[memdriver] init
");

    // Remap reserved physical memory (that we grabbed at boot time)
    do_gettimeofday( &t1 );
    reservedBlock = ioremap( RESERVED_REGION_OFFSET, RESERVED_REGION_SIZE );
    do_gettimeofday( &t2 );
    printk( KERN_ERR "[memdriver] ioremap() took %d usec
", usec_diff( &t2, &t1 ) );

    // Set the memory to a known value
    do_gettimeofday( &t1 );
    memset( reservedBlock, 0xAB, RESERVED_REGION_SIZE );
    do_gettimeofday( &t2 );
    printk( KERN_ERR "[memdriver] memset() took %d usec
", usec_diff( &t2, &t1 ) );

    // Register the character device
    ...

    return 0;
}

I load the driver, and check dmesg. It reports:

[memdriver] init
[memdriver] ioremap() took 76268 usec
[memdriver] memset() took 12622779 usec

That's 12.6 seconds for the memset. That means the memset is running at 81 MB/sec. Why on earth is it so slow?

This is kernel 2.6.34 on Fedora 13, and it's an x86_64 system.

EDIT:

The goal behind this scheme is to take a chunk of physical memory and make it available to both a PCI device (via the memory's bus/physical address) and a user space application (via a call to mmap, supported by the driver). The PCI device will then continually fill this memory with data, and the user-space app will read it out. If ioremap is a bad way to do this (as Ben suggested below), I'm open to other suggestions that'll allow me to get any large chunk of memory that can be directly accessed by both hardware and software. I can probably make do with a smaller buffer also.

See my eventual solution below.

Eric Seppanen · Accepted Answer

ioremap allocates uncacheable pages, as you'd desire for access to a memory-mapped-io device. That would explain your poor performance.

You probably want kmalloc or vmalloc. The usual reference materials will explain the capabilities of each.

Ben Jackson · Answer

I don't think ioremap() is what you want there. You should only access the result (what you call reservedBlock) with readb, readl, writeb, memcpy_toio etc. It is not even guaranteed that the return is virtually mapped (although it apparently is on your platform). I'd guess that the region is being mapped uncached (suitable for IO registers) leading to the terrible performance.

Dave Ceddia · Answer

It's been a while, but I'm updating since I did eventually find a workaround for this ioremap problem.

Since we had custom hardware writing directly to the memory, it was probably more correct to mark it uncacheable, but it was unbearably slow and wasn't working for our application. Our solution was to only read from that memory (a ring buffer) once there was enough new data to fill a whole cache line on our architecture (I think that was 256 bytes). This guaranteed we never got stale data, and it was plenty fast.

Memory access after ioremap very slow

Tags:

performance

memory

linux-kernel

kernel

driver

Dave Ceddia

3 Answers

Eric Seppanen

Ben Jackson

Dave Ceddia

Recent Activity

Donate For Us

Memory access after ioremap very slow

Tags:

performance

memory

linux-kernel

kernel

driver

Dave Ceddia

3 Answers

Eric Seppanen

Ben Jackson

Dave Ceddia

Related questions

Recent Activity

Donate For Us