Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Better way to dynamically update tile data on Commodore 64

I'm planning to use software sprites in multicolor char mode for my new C64 project. My idea is to use superimpose 'bullet' sprite data to tile data.

I think I can have tileset data at address 'TILESET', sprite data at address 'SPRITE'. And I can combine this two to prepare a bullet char with dynamically calculated background and store in address 'SUPERIMPOSED'

I wrote the following code and cycle count to check if it is feasible. And I think it is not. The loop eats up 219 cycles. Nearly four raster lines. And I didn't include other necessary calculations required before this loop. Like calculation of target addresses.

When I want to have 16 bullets on the screen, it will take 64 rasters or 8 character rows. So I become suspicious. Is this the correct way? Or are there any other more optimized way to do the same job?

                         cycles
                        ---------
    ldy #$07             4 x1 = 4   
-   LDA TILESET,x       3 x8 = 24
    AND SPRITE,x        4 x8 = 32 
    STA SUPERIMPOSED,x  5 x8 = 40
    dey                 2 x8 = 16
    cpy                 4 x8 = 32
    bne -               3 x8-1 = 71 
                        ----------
                        219 Cycle

I'm considering have repeating pattern in background. So that I can use same bullet tile without re-calculating.

like image 393
wizofwor Avatar asked Sep 27 '15 09:09

wizofwor


1 Answers

As Jester suggests, as a first optimisation just repeat the lda, and, sta and dey eight times. Eliminate the cpy and bne. That'll save 103 cycles immediately. Even if you want to keep the formal loop, notice that dey sets the zero flag so you don't need the cpy.

As a second optimisation, consider a compiled sprite. Instead of performing the read from sprite, x, you'd have those values coded directly into your routine, making a distinct routine for each sprite. That'd cut another 16 cycles.

That being said, your lda would be 4 cycles in an aligned table, not 3. So there are 8 you haven't accounted for. Meaning that unrolled plus specialised to your sprite = 102 cycles (having omitted the final dey).

Without knowing the C64 architecture and/or what the rest of your code does, if whomever ingests SUPERIMPOSED can do so from the stack page, consider writing output to the stack rather than via indexed addressing. Just load s with an appropriate seed value and store new results via pha. That'll save two cycles per store at the cost of 12 additional cycles of setup and restore.

Following on from that thought, if you had freedom in how these tables look then consider switching their format — instead of one table that holds all eight bytes of TILESET, use eight tables, each of which holds one byte of it. That'd remove the need to adjust y in the loop; just use a different target table in each unrolled iteration.

Supposing both TILESET and SUPERIMPOSED can be eight tables that gets you down to:

LDA TILESET1, x
AND #<value>
STA SUPERIMPOSED1, x    ; * 8

[... LDA TILESET2, x ...]

... which is a total of 88 cycles. If SUPERIMPOSED is linear but in the stack page then:

TSX
TXA
LDX #newdest
TXS
TAX                ; adds 10

LDA TILESET1, y
AND #<value>
PHA                ; * 8

[... LDA TILESET2, y ...]

TXS                ; adds 2

... which is 84 cycles.

Late addition:

If you're willing to premultiply the index in x by 8, effectively reducing your indexable range to 32 tiles, then you can proceed filling a linear output array without adjusting y, as per:

LDA TILESET, x
AND #<value1>
STA SUPERIMPOSED, x

LDA TILESET+1, x
AND #<value2>
STA SUPERIMPOSED+1, x

... etc ...

So you'd need eight copies of that routine with different table base addresses still to be able to hit 256 output tiles. Supposing you have 20 sprites, that makes a total of 20*8 = 160 copies of your sprite plotting routine, each of which is likely to be of the order of 100 bytes, so you're spending about 16kb.

If your game is much heavier on one kind of sprite than on others — e.g. it's usually two or three spaceships shooting thousands of bullets at each other — then obviously you can optimise very selectively and keep that total footprint down.

like image 110
Tommy Avatar answered Sep 22 '22 12:09

Tommy