Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C getline memory leak different behaviours

I have a question about the function getline(), that seems to behave differently in two scenarios about the memory usage, as reported by valgrind. I post the code of the two cases and explain the behaviors. I hope somebody can point me in the right direction.

First case

getline() is called in a while loop, reading all the lines of a text file in a buffer. The buffer is then freed ONLY ONCE at the end of the loop: in this case valgrind gives no errors (no leaks occur).

int main(int argc, char* argv[])
{
    char* buffer = NULL;
    size_t bufsize = 0;
    ssize_t nbytes;
    int counter = 0;
    char error = 0;

    FILE* input_fd = fopen(argv[1], "r");

    while ((nbytes = getline(&buffer, &bufsize, input_fd)) != -1)
    {
        counter += 1;
    }

    free(buffer);
    fclose(input_fd);

    return 0;
}

Second case

The same loop calls a function that, in turn, calls getline(), passing the same buffer. Again, the buffer is freed only once, at the end of the loop, but in this case valgrind reports a memory leak. Indeed, making the program run and looking at RSS, I can see it increases as the loop goes on. Please note that, adding a free inside the loop (freeing the buffer every cycle) the problem disappears. Here's the code.

int my_getline(FILE* lf_fd, char** lf_buffer)
{
    ssize_t lf_nbytes = 0;
    size_t lf_bufsiz = 0;
    lf_nbytes = getline(lf_buffer, &lf_bufsiz, lf_fd);
    if (lf_nbytes == -1)
        return 1;
    return 0;
}

int main(int argc, char* argv[])
{
    char* lf_buffer = NULL;
    size_t bufsize = 0;
    ssize_t nbytes;
    int counter = 0;
    int new_line_counter = 0;
    char error = 0;

    FILE* lf_fd = fopen(argv[1], "r");

    while ((my_getline(lf_fd, &lf_buffer)) == 0)
    {
        // Added to allow measuring the RSS
        sleep(2);
   
        // If I uncomment this, no memory leak occurs
        //free(lf_buffer);
    }

    free(lf_buffer);
    fclose(lf_fd);

    return 0;
}

Valgrind output

==9604== Memcheck, a memory error detector
==9604== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==9604== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==9604== Command: ./my_getline_x86 /media/sf_Scambio/processes.log
==9604== HEAP SUMMARY:
==9604==     in use at exit: 1,194 bytes in 2 blocks
==9604==   total heap usage: 8 allocs, 6 frees, 11,242 bytes allocated
==9604== 
==9604== 1,194 bytes in 2 blocks are definitely lost in loss record 1 of 1
==9604==    at 0x483DFAF: realloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-
linux.so)
==9604==    by 0x48E371D: getdelim (iogetdelim.c:102)
==9604==    by 0x1092B3: my_getline (my_getline.c:14)
==9604==    by 0x10956A: main (my_getline.c:38)
==9604== 
==9604== LEAK SUMMARY:
==9604==    definitely lost: 1,194 bytes in 2 blocks
==9604==    indirectly lost: 0 bytes in 0 blocks
==9604==      possibly lost: 0 bytes in 0 blocks
==9604==    still reachable: 0 bytes in 0 blocks
==9604==         suppressed: 0 bytes in 0 blocks
==9604== 
==9604== For lists of detected and suppressed errors, rerun with: -s
==9604== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
like image 253
bui3 Avatar asked Mar 02 '23 23:03

bui3


1 Answers

The first program is fine.

The issue with the second one comes from the buffer length argument to getline(). Your my_getline() always sets it to 0, meaning getline() allocates a new buffer each time (At least, with the glibc implementation you're using; see below). Change it to

int my_getline(FILE* lf_fd, char** lf_buffer, size_t* lf_bufsiz)
{
    ssize_t lf_nbytes = 0;
    lf_nbytes = getline(lf_buffer, lf_bufsiz, lf_fd);
    if (lf_nbytes == -1)
        return 1;
    return 0;
}

and pass a pointer to a size_t variable originally initialized to 0 when using it. The existing bufsize variable in main() looks like it would be appropriate to use:

//...
while ((my_getline(lf_fd, &lf_buffer, &bufsize)) == 0)
// ...

While it was easy to work around, the memory leak you encountered appears to be a bug in the glibc implementation of getline().

From the POSIX documentation:

If *lineptr is a null pointer or if the object pointed to by *lineptr is of insufficient size, an object shall be allocated as if by malloc() or the object shall be reallocated as if by realloc(), respectively, such that the object is large enough to hold the characters to be written to it...

and the glibc manpage:

Alternatively, before calling getline(), *lineptr can contain a pointer to a malloc(3)-allocated buffer *n bytes in size. If the buffer is not large enough to hold the line, getline() resizes it with realloc(3), updating *lineptr and *n as necessary.

These suggest that, in the case you're running into, where you're passing a valid non-NULL pointer to memory and saying it's 0 length, the function should be using realloc() to resize it. However, the glibc implementation checks *lineptr == NULL || *n == 0 and if true, overwrites *lineptr with a newly allocated buffer, causing the leak you saw. Compare the NetBSD implementation, which uses realloc() for all allocation (realloc(NULL, x) is equivalent to malloc(x)), and thus won't cause a leak with your original code. It's not ideal because it causes a realloc() on every use instead of just when the buffer isn't big enough to hold the current line (Unlike the fixed version above), but it works.

like image 166
Shawn Avatar answered Mar 12 '23 00:03

Shawn