Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Two printfs print the same string differently

I am trying to create a library that handles big integer arithmetic. Big integers are stored in a struct:

typedef struct BigInt BigInt;
struct BigInt
{
    uint32_t size;
    uint32_t *data;
};

the first member is an uint32_t containing the length of the number and the second member is a pointer pointing to the actual number data (stored in two's complement). I have written a simple toHex(BigInt *a) function that allocates memory, prints the hexadecimal value of the big integer to the string, and returns the address.

In my main loop, I have the following:

int main(int argc, char *argv[])
{
    char *ap, *bp;
    BigInt *a = fromUInt32(0x7fffffff), *b = fromUInt32(1), *c = fromUInt32(0x80000000);
    _add(a, b);
    ap = toHex(a);
    bp = toHex(c);
    printf("%s\n", ap);
    printf("%s\n%s\n", ap, bp);
    printf("%s\n%s\n", ap, bp);
    free(ap);
    free(bp);
    deleteBigInt(a);
    deleteBigInt(b);
    deleteBigInt(c);
}

which, curiously enough, prints

0000000080000000
0
0000000080000000
0000000080000000
0000000080000000

So the second printf statements print something different for ap than the first and third printf statement. It seems the first printf statement is correct, and the second one is messing up. I have stepped through my code with GDB and after the evaluation of toHex, ap points to the string "0000000080000000", terminated by a null pointer.

I am completely baffled. As far as I can see, the possibilities are:
1. I have run into undefined behaviour for some weird reason.
2. In _add I call a routine written in x86 assembly code, there may be an error in it (but I do adhere to GCC's calling conventions by preserving esi, edi, ebx, ebp, and esp).
3. There is a bug in printf, which seems very unlikely.

Also I have an obvious "memory leak" (quoted because the opinions on what a memory leak exactly is seem to differ) by not freeing the memory allocated by toHex, but this should not matter.

My toHex function was requested by Sourav Ghosh, and is as follows:

char numToHex[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' };
char *toHex(BigInt *a)
{
    char *result, *ptr;
    // allocate enough space for 8 characters for each uint32_t and 1 terminating 0
    ptr = result = malloc(a->size * 8 + 1);
    // loop over the uint32_t's stored in a->data
    // (there are a->size of them)
    for (uint32_t i = 0; i < a->size; i++)
        // parse 8 blocks of 4 bits
        for (uint32_t j = 0; j < 8; j++)
            // grab the right bits and convert them to a hex digit
            *(ptr++) = numToHex[(a->data[i] >> ((7 - j) * 4)) & 0xf];
    // add a terminating zero byte
    *ptr = 0;
    return result;
}

I have isolated this weird behaviour in a program of ~100 lines of C + ~70 lines of assembly. Compiling can be done with

nasm -f elf -s <AssemblyName>.asm
gcc <CFile>.c <AssemblyName>.o -o <OutputProgram> -m32 -std=c99 -g

The code is uncommented and meant for people who want to inspect the behaviour for themselves.

EDIT: Jan Spurny and Matt McNabb urged me to use Valgrind. Valgrind says: Invalid read of size 1 at 0x40A5685: vfprintf (vfprintf.c:1655) by 0x40AA7FE: printf (printf.c:34) by 0x4075904: (below main) (libc-start.c:260) Address 0x42121af is 1 bytes before a block of size 17 alloc'd at 0x40299D8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so) by 0x804887D: toHex (weird.c:107) by 0x8048565: main (weird.c:30)

But this doesn't make sense, as I set result to malloc in toHex, and didn't change anything after that. My bet now is that some register is getting corrupted in the assembly function. Edit2: After checking with GDB, I can see that no registers are corrupted. I am still clueless.

like image 395
Ruben Avatar asked Mar 16 '15 12:03

Ruben


People also ask

How can you print in the same line using two different print statements?

Modify print() method to print on the same line The print method takes an extra parameter end=” “ to keep the pointer on the same line. The end parameter can take certain values such as a space or some sign in the double quotes to separate the elements printed in the same line.

How do I print a string on different lines?

Use triple quotes to create a multiline string It is the simplest method to let a long string split into different lines. You will need to enclose it with a pair of Triple quotes, one at the start and second in the end. Anything inside the enclosing Triple quotes will become part of one multiline string.

How do I print the same string multiple times?

Use the multiplication operator * to repeat a string multiple times. Multiply a string with the multiplication operator * by an integer n to concatenate the string with itself n times. Call print(value) with the resultant string as value to print it.


1 Answers

The reduce function has a bug:

while (i < a->size && !(a->data[i])) i++;
if (a->data[i] & SIGNBIT) i--;

If the i < a->size condition is hit, then a->data[i] accesses out of bounds, causing undefined behaviour. The other branch of reduce has the same problem


There's a bug in the _add function (although this is not triggered in your test case):

void *k = realloc(a->data, b->size * 4);
memmove((void *)(a->data + displacement), (void *)a->data, a->size * 4);
// ....other code using `a->data`

After realloc, a->data becomes indeterminate so it causes undefined behaviour to use it. This could explain your symptoms as a future allocation might re-use the same freed block which a->data is still pointing to.

Maybe you meant to also have a line a->data = k; after this?


To get good help with debugging your code it would be great if you could do the following:

  • Check the result of all *alloc-family functions and exit if NULL is returned. Otherwise you get undefined behaviour (it's not reliable to expect a segfault).
  • Rewrite the assembly function in C. This is a good idea for a number of reasons (debugging, code portability, optimization). It might even turn out that gcc -O3 generates faster code than your handwritten version; that's what compilers are good at.
  • Inspect the result of calling newAddress to check it actually returns what you expected in your test case.
like image 90
M.M Avatar answered Sep 19 '22 15:09

M.M