I have some functions that are designed to handle 1-256 bytes, running on an embedded C platform where passing a byte is much faster and more compact than passing an int (one instruction versus three), what is the preferred way of coding it:
It is expected that the inner loop of the function will probably represent 15%-30% of processor execution time when the system is busy; it will sometimes be used for small numbers of bytes, and sometimes for large ones. The memory chip used by the function has a per-transaction overhead, and I prefer to have my memory-access function do the start-transaction/do-stuff/end-transaction sequence internally.
The most efficient code would be to simply accept an unsigned char and regard a parameter value of 0 as a request to do 256 bytes, relying on the caller to avoid any accidental attempts to read 0 bytes. That seems a bit dangerous, though. Have others dealt with such issues on embedded systems? How were they handled?
EDIT The platform is a PIC18Fxx (128K code space; 3.5K RAM), connecting to an SPI flash chip; reading 256 bytes when fewer are expected would potentially overrun read buffers in the PIC. Writing 256 bytes instead of 0 would corrupt data in the flash chip. The PIC's SPI port is limited to one byte every 12 instruction times if one doesn't check busy status; it will be slower if one does. A typical write transaction requires sending 4 bytes in addition to the data to be received; a read requires an extra byte for "SPI turnaround" (the fastest way to access the SPI port is to read the last byte just before sending the next one).
The compiler is HiTech PICC-18std.
I've generally liked the HiTech's PICC-16 compilers; HiTech seems to have diverted their energies away from the PICC-18std product toward their PICC-18pro line which has even slower compilation times, seems to require the use of 3-byte 'const' pointers rather than two-byte pointers, and has its own ideas about memory allocation. Maybe I should look more at the PICC-18pro, but when I tried compiling my project on an eval version of PICC-18pro it didn't work and I didn't figure out exactly why--perhaps something about variable layout not agreeing with my asm routines--I just kept using PICC-18std.
Incidentally, I just discovered that PICC-18 particularly likes do {} while(--bytevar); and particularly dislikes do {} while(--intvar); I wonder what's going through the compiler's "mind" when it generates the latter?
do { local_test++; --lpw; } while(lpw); 2533 ;newflashpic.c: 792: do 2534 ;newflashpic.c: 793: { 2535 0144A8 2AD9 incf fsr2l,f,c 2536 ;newflashpic.c: 795: } while(--lpw); 2537 0144AA 0E00 movlw low ?_var_test 2538 0144AC 6EE9 movwf fsr0l,c 2539 0144AE 0E01 movlw high ?_var_test 2540 0144B0 6EEA movwf fsr0h,c 2541 0144B2 06EE decf postinc0,f,c 2542 0144B4 0E00 movlw 0 2543 0144B6 5AED subwfb postdec0,f,c 2544 0144B8 50EE movf postinc0,w,c 2545 0144BA 10ED iorwf postdec0,w,c 2546 0144BC E1F5 bnz l242
The compiler loads a pointer to the variable, not even using the LFSR instruction (which would take two words) but a combination of MOVLW/MOVWF (taking four). Then it uses this pointer to do the decrement and compare. While I'll admit that do{}while(--wordvar); cannot yield as nice code as do{}while(wordvar--); the code is better than what the latter format actually generates. Doing a separate decrement and while-test (e.g. while (--lpw,lpw)) yields sensible code, but it seems a bit ugly. The post-decrement operator could yield the best code for a down-counting loop:
decf _lpw btfss _STATUS,0 ; Skip next inst if carry (i.e. wasn't zero) decf _lpw+1 bc loop ; Carry will be clear only if lpw was zero
but it instead generates worse code than --lpw. The best code would be for an up-counting loop:
infsnz _lpw incfsz _lpw+1 bra loop
but the compiler doesn't generate that.
EDIT 2 Another approach I might use: allocate a global 16-bit variable for the number of bytes, and write the functions so that the counter is always zeroed before exit. Then if only an 8-bit value is required, it would only be necessary to load 8 bits. I'd use macros for stuff so they could be tweaked for best efficiency. On the PIC, using |= on a variable which is known to be zero is never slower than using =, and is sometimes faster. For example, intvar |= 15 or intvar |= 0x300 would be two instructions (each case only has to bother with one byte of the result and can ignore the other); intvar |= 4 (or any power of 2) is one instruction. Obviously on some other processors, intvar = 0x300 would be faster than intvar |= 0x300; if I use a macro it could be tweaked as appropriate.
A byte is a group of 8 bits. A bit is the most basic unit and can be either 1 or 0. A byte is not just 8 values between 0 and 1, but 256 (28) different combinations (rather permutations) ranging from 00000000 via e.g. 01010101 to 11111111 . Thus, one byte can represent a decimal number between 0(00) and 255.
The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit of memory in many computer architectures.
But what about a string? A string is composed of: An 8-byte object header (4-byte SyncBlock and a 4-byte type descriptor)
Your inner function should copy count + 1
bytes, e.g.,
do /* copy one byte */ while(count-- != 0);
If the post-decrement is slow, other alternatives are:
... /* copy one byte */
while (count != 0) { /* copy one byte */; count -= 1; }
or
for (;;) { /* copy one byte */; if (count == 0) break; count -= 1; }
The caller/wrapper can do:
if (count > 0 && count <= 256) inner((uint8_t)(count-1))
or
if (((unsigned )(count - 1)) < 256u) inner((uint8_t)(count-1))
if its faster in your compiler.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With