I am trying to implement AES-256 in CTR mode using nVidia CUDA. I have successfully coded CPU code for key expansion and now I need to implement the actual AES-256 algorithm. According to Wikipedia, some codes I've seen and particularly this PDF (page 9), AES rounds can be implemented as series of table lookups. My question is how do I generate these tables? I am aware that I need 4 KB to store these tables, and that is not a problem. I have spent whole day trying to find these tables with no success. The PDF I posted a link to mentions lookup tables T0, T1, T2 and T3, but I do not know what these are. It also mentions round keys 4, 5, 6 and 7, but I also do not understand what these indices are referring to.
The closest I have come to figuring out how to generate these lookup tables is from this project. Inside the code, there is a comment that says:
Te0[x] = S [x].[02, 01, 01, 03];
Te1[x] = S [x].[03, 02, 01, 01];
Te2[x] = S [x].[01, 03, 02, 01];
Te3[x] = S [x].[01, 01, 03, 02];
However, I'm not entirely sure I know what that notation means (is it a matrix multiplication or something else?). The only thing I recognize is the mix-column part constant matrix, as well as the S-box matrix.
[Edit] Now that someone pointed it out - how can a lookup implementation be actually slower? Would it be wise to implement AES without lookup tables here?
The T tables are a straightforward description of the AES round transformation in matrix form. To build them, see the original Rijndael NIST proposal, section 5.2.1.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With