I am doing a GHASH for the AES-GCM implementation.
and i need to implement this
where v is the bit length of the final block of A, u is the bit length of the final block of C, and || denotes concatenation of bit strings.
How can I do the concatenation of A block to fill in the zeros padding from v to 128 bit, as I do not know the length of the whole block of A. So I just take the A block and XOR it with an array of 128 bits
void GHASH(uint8_t H[16], uint8_t len_A, uint8_t A_i[len_A], uint8_t len_C,
uint8_t C_i[len_C], uint8_t X_i[16]) {
uint8_t m;
uint8_t n;
uint8_t i;
uint8_t j;
uint8_t zeros[16] = {0};
if (i == m + n) {
for(j=16; j>=0; j--){
C_i[j] = C_i[j] ^ zeros[j]; //XOR with zero array to fill in 0 of length 128-u
tmp[j] = X_i[j] ^ C_i[j]; // X[m+n+1] XOR C[i] left shift by (128bit-u) and store into tmp
gmul(tmp, H, X_i); //Do Multiplication of tmp to H and store into X
}
}
I am pretty sure that I am not correct. But I have no idea how to do it.
It seems to me that you've got several issues here, and conflating them is a big part of the problem. It'll be much easier when you separate them.
First: passing in a parameter of the form uint8_t len_A, uint8_t A_i[len_A]
is not proper syntax and won't give you what you want. You're actually getting uint8_t len_A, uint8_t * A_i
, and the length of A_i is determined by how it was declared on the level above, not how you tried to pass it in. (Note that uint8_t * A
and uint8_t A[]
are functionally identical here; the difference is mostly syntactic sugar for the programmer.)
On the level above, since I don't know if it was declared by malloc() or on the stack, I'm not going to get fancy with memory management issues. I'm going to use local storage for my suggestion.
Unit clarity: You've got a bad case going on here: bit vs. byte vs. block length. Without knowing the core algorithm, it appears to me that the undeclared m & n are block lengths of A & C; i.e., A is m blocks long, and C is n blocks long, and in both cases the last block is not required to be full length. You're passing in len_A & len_C without telling us (or using them in code so we can see) whether they're the bit length u/v, the byte length of A_i/C_i, or the total length of A/C, in bits or bytes or blocks. Based on the (incorrect) declaration, I'm assuming they're the length of A_i/C_i in bytes, but it's not obvious... nor is it the obvious thing to pass. By the name, I would have guessed it to be the length of A/C in bits. Hint: if your units are in the names, it becomes obvious when you try to add bitLenA to byteLenB.
Iteration control: You appear to be passing in 16-byte blocks for the i'th iteration, but not passing in i. Either pass in i, or pass in the full A & C instead of A_i & C_i. You're also using m & n without setting them or passing them in; the same issue applied. I'll just pretend they're all correct at the moment of use and let you fix that.
Finally, I don't understand the summation notation for the i=m+n+1 case, in particular how len(A) & len(C) are treated, but you're not asking about that case so I'll ignore it.
Given all that, let's look at your function:
void GHASH(uint8_t H[], uint8_t len_A, uint8_t A_i[], uint8_t len_C, uint8_t C_i[], uint8_t X_i[]) {
uint8_t tmpAC[16] = {0};
uint8_t tmp[16];
uint8_t * pAC = tmpAC;
if (i == 0) { // Initialization case
for (j=0; j<len_A; ++j) {
X_i[j] = 0;
}
return;
} else if (i < m) { // Use the input memory for A
pAC = A_i;
} else if (i == m) { // Use temp memory init'ed to 0; copy in A as far as it goes
for (j=0; j<len_A; ++j) {
pAC[j] = A_i[j];
}
} else if (i < m+n) { // Use the input memory for C
pAC = C_i;
} else if (i == m+n) { // Use temp memory init'ed to 0; copy in C as far as it goes
for (j=0; j<len_A; ++j) {
pAC[j] = C_i[j];
}
} else if (i == m+n+1) { // Do something unclear to me. Maybe this?
// Use temp memory init'ed to 0; copy in len(A) & len(C)
pAC[0] = len_A; // in blocks? bits? bytes?
pAC[1] = len_C; // in blocks? bits? bytes?
}
for(j=16; j>=0; j--){
tmp[j] = X_i[j] ^ pAC[j]; // X[m+n+1] XOR A or C[i] and store into tmp
gmul(tmp, H, X_i); //Do Multiplication of tmp to H and store into X
}
}
We only copy memory in the last block of A or C, and use local memory for the copy. Most blocks are handled with a single pointer copy to point to the correct bit of input memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With