I'm looking for a fast and memory efficient approach for implementing Conway's Game of Life.
Constraints: a 96x128 board, approximately 2kB RAM available and 52MHz processor (see the tech specs here: http://www.getinpulse.com/features).
My current naive solution that represents each cell as a single bit in a matrix (96*128/8=1,536 bytes) works but is too slow. What tricks can be used to improve performance?
Storing the coordinates of live cells (for example in this implementation http://dotat.at/prog/life/life.html) would use too much memory.
Looks like a fun piece of hardware. Storing 1 bit per pixel of a 96x128 pixel display gives 12,288 bits. That's already over half of the 16,384 bits you say are "available". If you can't even store 2 bits per cell, there's not a lot of room to do much.
A few ideas:
You have a 32-bit processor. There are several bit-twiddling tricks to take a bitmap and calculate the number of neighbors of several cells in parallel on such a processor.
It's often faster to store the neighbor counts, incrementing all 8 neighbors at birth and decrementing all 8 neighbors at death, rather than recalculating the number of neighbors from scratch every generation -- but it doesn't look like you have enough memory for this approach.
Perhaps you could use 2x2 pixels per cell (so only 3,072 cells visible) or 3x3 pixels per cell (so only 1,376 cells visible), so your software does less work and so gives the illusion of running faster. (This also frees up enough RAM that you can do the neighbor count mentioned above).
Since many life patterns have large areas of dead space, perhaps have a small bitmap of "live regions" -- say, a 12x16 array of bits, where each bit is "0" if the corresponding 8x8 cell region is entirely dead, and "1" if any of the cells in the corresponding region is alive. The next-generation update only needs to look at live regions and the 1-cell-wide border of dead regions; you can skip checking the 6x6 cell core of dead regions. Also, you can completely skip the entire 8x8 cell region if its 4 nearest neighboring regions are also dead.
Since many life patterns have large areas of static unchanging "still life" patterns, perhaps have a small bitmap of "dynamic regions" -- say, a 12x16 array of bits, where each bit is "0" if the corresponding 8x8 cell region had no changes in the last generation update, and "1" if any of the cells in the corresponding region changed in the last update. The next generation update only needs to look at dynamic regions and the 1-cell-wide border of static regions; you can skip checking the 6x6 cell core of static regions, since it will be the same in the next generation.
If a pattern is "large enough", a representation based on Gosper's Hashlife may be able to store it in less RAM than storing a bitmap directly. Alas, I suspect you're far below the "large enough" threshold.
It is quite common with small life universes to make them wrap round on all sides (to make a toroidal universe) but this requires double buffering. In your case this requires 3KB RAM but you only have 2KB.
If you don't wrap then you don't need to double-buffer the whole universe. You just need to avoid overwriting cells before you have finished using them as input to the calculation.
Say your universe is laid out as a conventional bitmap. We are going to treat it as a series of rows arranges sequentially in memory. Say the universe has four rows numbered 0 to 3.
0
1
2
3
4
...
When you calculate the next generation, the new version of row 3 is calculated using rows 2, 3, and 4 (which is blank). You can write the new version of row 3 on top of row 4. Similarly, calculate the new row 2 from rows 1,2,3 and write it on top of row 3, since that data is no longer needed after calculating row 2. The new row 1 is calculated from rows 0,1,2 and written over row 2.
This means you only need memory for one extra row. 97*128 bits is 1552 bytes.
The disadvantage is that your universe scrolls through memory and you have to have some mechanism to deal with this. Either copy it back to its original position after each calculation (which is slow) or arrange to be able to display it from any position in memory, and ensure that your calculation and display systems can wrap around from high to low addresses.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With