Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is there no Z80 like LDIR functionality in C/C++/rtl?

Tags:

c++

c

z80

In Z80 machine code, a cheap technique to initialize a buffer to a fixed value, say all blanks. So a chunk of code might look something like this.

LD HL, DESTINATION             ; point to the source
LD DE, DESTINATION + 1         ; point to the destination
LD BC, DESTINATION_SIZE - 1    ; copying this many bytes
LD (HL), 0X20                  ; put a seed space in the first position
LDIR                           ; move 1 to 2, 2 to 3...

The result being that the chunk of memory at DESTINATION is completely blank filled. I have experimented with memmove, and memcpy, and can't replicate this behavior. I expected memmove to be able to do it correctly.

Why do memmove and memcpy behave this way?

Is there any reasonable way to do this sort of array initialization?

I am already aware of char array[size] = {0} for array initialization

I am already aware that memset will do the job for single characters.

What other approaches are there to this issue?

like image 858
EvilTeach Avatar asked Dec 22 '08 22:12

EvilTeach


2 Answers

memmove and memcpy don't work that way because it's not a useful semantic for moving or copying memory. It's handy in the Z80 to do be able to fill memory, but why would you expect a function named "memmove" to fill memory with a single byte? It's for moving blocks of memory around. It's implemented to get the right answer (the source bytes are moved to the destination) regardless of how the blocks overlap. It's useful for it to get the right answer for moving memory blocks.

If you want to fill memory, use memset, which is designed to do just what you want.

like image 107
Ned Batchelder Avatar answered Sep 18 '22 18:09

Ned Batchelder


There was a quicker way of blanking an area of memory using the stack. Although the use of LDI and LDIR was very common, David Webb (who pushed the ZX Spectrum in all sorts of ways like full screen number countdowns including the border) came up with this technique which is 4 times faster:

  • saves the Stack Pointer and then moves it to the end of the screen.
  • LOADs the HL register pair with zero,
  • goes into a massive loop PUSHing HL onto the Stack.
  • The Stack moves up the screen and down through memory and in the process, clears the screen.

The explanation above was taken from the review of David Webbs game Starion.

The Z80 routine might look a little like this:

  DI              ; disable interrupts which would write to the stack.
  LD HL, 0
  ADD HL, SP      ; save stack pointer
  EX DE, HL       ; in DE register
  LD HL, 0
  LD C, 0x18      ; Screen size in pages
  LD SP, 0x4000   ; End of screen
PAGE_LOOP:
  LD B, 128       ; inner loop iterates 128 times
LOOP:
  PUSH HL         ; effectively *--SP = 0; *--SP = 0;
  DJNZ LOOP       ; loop for 256 bytes
  DEC C
  JP NZ,PAGE_LOOP
  EX DE, HL
  LD SP, HL       ; restore stack pointer
  EI              ; re-enable interrupts

However, that routine is a little under twice as fast. LDIR copies one byte every 21 cycles. The inner loop copies two bytes every 24 cycles -- 11 cycles for PUSH HL and 13 for DJNZ LOOP. To get nearly 4 times as fast simply unroll the inner loop:

LOOP:
   PUSH HL
   PUSH HL
   ...
   PUSH HL         ; repeat 128 times
   DEC C
   JP NZ,LOOP

That is very nearly 11 cycles every two bytes which is about 3.8 times faster than the 21 cycles per byte of LDIR.

Undoubtedly the technique has been reinvented many times. For example, it appeared earlier in sub-Logic's Flight Simulator 1 for the TRS-80 in 1980.

like image 39
devstopfix Avatar answered Sep 20 '22 18:09

devstopfix