I'm programming an embedded powerpc 32 system with a 32 kbyte 8-way set associative L2 instruction cache. To avoid cache thrashing we align functions in a way such that the text of a set of functions called at a high frequency (think interrupt code) ends up in separate cache sets. We do this by inserting dummy functions as needed, e.g.
void high_freq1(void)
{
...
}
void dummy(void)
{
__asm__(/* Silly opcodes to fill ~100 to ~1000 bytes of text segment */);
}
void high_freq2(void)
{
...
}
This strikes me as ugly and suboptimal. What I'd like to do is
__asm__
entirely and use pure C89 (maybe C99)dummy()
spacer that the GCC optimizer does not touchdummy()
spacer should be configurable as a multiple of 4 bytes. Typical spacers are 260 to 1000 bytes.I'm also willing to explore entirely new techniques of placing a set of selected functions in a way so they aren't mapped to the same cache lines. Can a linker script do this?
Use GCC's __attribute__(( aligned(size) ))
.
Or, pass -falign-functions=n
on your GCC command line.
Maybe linker scripts are the way to go. The GNU linker can use these I think... I've used LD files for the AVR and on MQX both of which we using GCC based compilers... might help...
You can define your memory sections etc and what goes where... Each time I come to write one its been so long since the last I have to read up again...
Have a search for SVR3-style command files to gem up.
DISCLAIMER: Following example for a very specific compiler... but the SVR3-like format is pretty general... you'll have to read up for your system
For example you can use commands like...
ApplicationStart = 0x...;
MemoryBlockSize = 0x...;
ApplicationDataSize = 0x...;
ApplicationLength = MemoryBlockSize - ApplicationDataSize;
MEMORY {
RAM: ORIGIN = 0x... LENGTH = 1M
ROM: ORIGIN = ApplicationStart LENGTH = ApplicationLength
}
This defines three memory sections for the linker. Then you can say things like
SECTIONS
{
GROUP :
{
.text :
{
* (.text)
* (.init , '.init$*')
* (.fini , '.fini$*')
}
.my_special_text ALIGN(32):
{
* (.my_special_text)
}
.initdat ALIGN(4):
// Blah blah
} > ROM
// SNIP
}
The SECTIONS
command tells the linker how to map input sections into output sections, and how to place the output sections in memory... Here we're saying what is going into the ROM output section, which we defined in the MEMORY
definition above. The bit possible of interest to you is .my_special_text
. In your code you can then do things like...
__attribute__ ((section(".my_special_text")))
void MySpecialFunction(...)
{
....
}
The linker will put any function preceded by the __attribute__
statement into the my_special_text
section. In the above example this is placed into ROM on the next 4 byte aligned boundary after the text
section, but you can put it anyway you like. So you could make a few sections, one for each of the functions you describe, and make sure the addresses won't cause clashes...
You can the size and memory location of the section using linker defined variables of the form
extern char_fsection_name[]; // Set to the address of the start of section_name
extern char_esection_name[]; // Set to the first byte following section_name
So for this example...
extern char _fmy_special_text[]; // Set to the address of the start of section_name
extern char _emy_special_text[]; // Set to the first byte following section_name
If you are willing to expend some effort, you can use
__attribute__((section(".text.hotpath.a")))
to place the function into a separate section, and then in a custom linker script explicitly place the functions.
This gives you a bit more fine-grained control than simply asking for the functions to be aligned, but requires more hand-holding.
Example, assuming that you want to lock 4KiB into cache:
SECTIONS {
.text.hotpath.one BLOCK(0x1000) {
*(.text.hotpath.a)
*(.text.hotpath.b)
}
}
ASSERT(SIZEOF(.text.hotpath.one) <= 0x1000, "Hot Path functions do not fit into 4KiB")
This will make sure the hot path functions a
and b
are next to each other and both fit into the same block of 4 KiB that is aligned on a 4 KiB boundary, so you can simply lock that page into the cache; if the code doesn't fit, you get an error.
You can even use
NOCROSSREFS(.text.hotpath.one .text)
to forbid hot path functions calling other functions.
Assuming you're using GCC and GAS, this may be a simple solution for you:
void high_freq1(void)
{
...
}
asm(".org .+288"); /* Advance location by 288 bytes */
void high_freq2(void)
{
...
}
You could, possibly, even use it to set absolute locations for the functions rather than using relative increments in address, which would insulate you from consequences due to the functions changing in size when/if you modify them.
It's not pure C89, for sure, but it may be less ugly than using dummy functions. :)
(Then again, it should be mentioned that linker scripts aren't standardized either.)
EDIT: As noted in the comments, it seems to be important to pass the -fno-toplevel-reorder
flag to GCC in this case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With