Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Configuration registers for LPC bus in Poulsbo System Controller Hub (US15W)

We have a system based around an Atom Z510/Intel SCH US15W Q7 card (running Debian Linux.) We need to transfer blocks of data from a device on the Low Pin Count Bus. As far as I know this chipset does not provide DMA facilities, meaning the processor has to read the data out a byte at a time in a software loop. (The device driver actually implements this using the "rep insb" x86 instructions so the loop is actually implemented by the CPU if I understand correctly.)

This is far from optimal, but it should be possible to hit a transfer rate of 14Mb/s. Instead we can barely manage 4Mb/s with transactions on the bus no closer than 2us apart even though each read to the slave device is is done in 560ns. I don't believe other traffic on the bus is to blame, but am still investigating.

My question is:

Does any one know if there are any configuration registers on the SCH that could affect the LPC bus timing?

I cannot find any useful information on the device on the Intel website, nor have I spotted anything in the Linux Kernel code that appears to be fiddling with any such registers (but I'm a noob when it come to Linux Kernel stuff.)

I'm not an x86 expert so any other factors that might come into play or any other 'war stories' relating to this device would be good to know about too.

Edit: I have found the datasheet. I've not seen anything in it that explains this behaviour, but I am investigating the possibility of mapping our device as a firmware device as the firmware bus cycles don't seem to suffer the same delays.

like image 888
sheddenizen Avatar asked Sep 28 '13 21:09

sheddenizen


1 Answers

For the record, the solution was to modify the FPGA firmware such that the chip's data in/out register was mapped to four adjacent addresses and the driver modified to do 32 bit inb/outb instructions. Although the SCH does not implement 32 bit LPC read/write operations, the result is 4 back-to-back 8 bit operations followed by the same dead time as I was getting previously with a single byte, meaning it averages about 1us per byte. Not ideal, but still a doubling in throughput.

It transpires the firmware cycles were quicker because the SCH transfers 64 bytes at a time from the firmware flash - after 64 bytes there is the same 1.4us gap, indicating this is the per-transaction latency of the device. Exploiting this may have been slightly quicker than the above solution however the trade-off is that it is limited to 64 bytes chunks and each byte takes longer (680ns IIRC) due to the additional cycles required to do a firmware read.

like image 66
sheddenizen Avatar answered Nov 09 '22 02:11

sheddenizen