I've started to implement a 8086/8088 with the goal of being cycle-exact. I can understand the reasoning behind the number of clock cycles for most instructions, however I must say I'm quite puzzled by the Effective Address (EA) calculation time.
More specifically, why does computing BP + DI or BX + SI take 7 cycles, but computing BP + SI or BX + DI take 8 cycles?
I could just wait for a given number of cycles, but I'm really interested in knowing why there's this 1-cycle difference (and overall why it takes so many cycles to do any EA calculation, since EA uses the ALU for computing addresses, and an ADD between registers is just 3 cycles).
Without reverse engineering the chip I don't think it's possible to explain the difference in cycles between [BP + SI] and [BP + DI]. (Note that it's not entirely out of the question that someone has done or will do the reverse engineering necessary, it's been done for the some of the chips in the Commodore 64 in order to create more exact emulators.) It however fairly easy to explain why effective address calculations in general take so long. The reason is the calculation for [BX + SI] is actually DS * 16 + BX + SI, so it's two adds, not just one. It's also a 20-bit calculation and the ALU is only 16 bits wide, so it takes one more add to calculate the upper 20-bits of the physical address. That's the equivalent three register to register adds that cost a total of 9 cycles, and assumes the 4-bit shift is free, so the EA calculation is actually faster than the equivalent instructions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With