In Z80 machine code, a cheap technique to initialize a buffer to a fixed value, say all blanks. So a chunk of code might look something like this.
LD HL, DESTINATION ; point to the source
LD DE, DESTINATION + 1 ; point to the destination
LD BC, DESTINATION_SIZE - 1 ; copying this many bytes
LD (HL), 0X20 ; put a seed space in the first position
LDIR ; move 1 to 2, 2 to 3...
The result being that the chunk of memory at DESTINATION is completely blank filled. I have experimented with memmove, and memcpy, and can't replicate this behavior. I expected memmove to be able to do it correctly.
Why do memmove and memcpy behave this way?
Is there any reasonable way to do this sort of array initialization?
I am already aware of char array[size] = {0} for array initialization
I am already aware that memset will do the job for single characters.
What other approaches are there to this issue?
memmove
and memcpy
don't work that way because it's not a useful semantic for moving or copying memory. It's handy in the Z80 to do be able to fill memory, but why would you expect a function named "memmove" to fill memory with a single byte? It's for moving blocks of memory around. It's implemented to get the right answer (the source bytes are moved to the destination) regardless of how the blocks overlap. It's useful for it to get the right answer for moving memory blocks.
If you want to fill memory, use memset, which is designed to do just what you want.
There was a quicker way of blanking an area of memory using the stack. Although the use of LDI and LDIR was very common, David Webb (who pushed the ZX Spectrum in all sorts of ways like full screen number countdowns including the border) came up with this technique which is 4 times faster:
The explanation above was taken from the review of David Webbs game Starion.
The Z80 routine might look a little like this:
DI ; disable interrupts which would write to the stack.
LD HL, 0
ADD HL, SP ; save stack pointer
EX DE, HL ; in DE register
LD HL, 0
LD C, 0x18 ; Screen size in pages
LD SP, 0x4000 ; End of screen
PAGE_LOOP:
LD B, 128 ; inner loop iterates 128 times
LOOP:
PUSH HL ; effectively *--SP = 0; *--SP = 0;
DJNZ LOOP ; loop for 256 bytes
DEC C
JP NZ,PAGE_LOOP
EX DE, HL
LD SP, HL ; restore stack pointer
EI ; re-enable interrupts
However, that routine is a little under twice as fast. LDIR copies one byte every 21 cycles. The inner loop copies two bytes every 24 cycles -- 11 cycles for PUSH HL
and 13 for DJNZ LOOP
. To get nearly 4 times as fast simply unroll the inner loop:
LOOP:
PUSH HL
PUSH HL
...
PUSH HL ; repeat 128 times
DEC C
JP NZ,LOOP
That is very nearly 11 cycles every two bytes which is about 3.8 times faster than the 21 cycles per byte of LDIR.
Undoubtedly the technique has been reinvented many times. For example, it appeared earlier in sub-Logic's Flight Simulator 1 for the TRS-80 in 1980.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With