Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

did I find a libxml2 bug (memory leak in multi-threaded parsing)?

I am working actually on a data processing code using libxml2. I am stuck on a memory leak impossible to remove . Here is a minimal code to generate it :

#include <stdlib.h>
#include <stdio.h>
#include <libxml/parser.h>
#include <libxml/tree.h>
#include <omp.h>

int main(void)
{
    xmlDoc *doc;
    int tn;
    char fname[32];

    omp_set_num_threads(2);
    xmlInitParser();
    #pragma omp parallel private(doc,tn,fname)
    {
        tn  = omp_get_thread_num();
        sprintf(fname,"testdoc%d.xml",tn);
        doc = xmlReadFile(fname,NULL,0);
        printf("document %s parsed on thread %d (%p)\n",fname,tn,doc);
        xmlFreeDoc(doc);
    }
    xmlCleanupParser();

    return EXIT_SUCCESS;
}

At runtime, output is :

document testdoc0.xml parsed on thread 0 (0x1005413a0)
document testdoc1.xml parsed on thread 1 (0x1005543c0)

confirming that we really have multi-threading and that doc is really private in the parallel region. One can notice that I applied correctly the thread safety instructions for using libxml2 (http://xmlsoft.org/threads.html). Valgrind reports :

HEAP SUMMARY:
    in use at exit: 9,000 bytes in 8 blocks
  total heap usage: 956 allocs, 948 frees, 184,464 bytes allocated

968 bytes in 1 blocks are definitely lost in loss record 6 of 8
   at 0x1000107AF: malloc (vg_replace_malloc.c:236)
   by 0x1000B2590: xmlGetGlobalState (in /opt/local/lib/libxml2.2.dylib)
   by 0x1000B1A18: __xmlDefaultSAXHandler (in /opt/local/lib/libxml2.2.dylib)
   by 0x100106D18: xmlDefaultSAXHandlerInit (in /opt/local/lib/libxml2.2.dylib)
   by 0x100041BE7: xmlInitParserCtxt (in /opt/local/lib/libxml2.2.dylib)
   by 0x100042145: xmlNewParserCtxt (in /opt/local/lib/libxml2.2.dylib)
   by 0x10004615E: xmlCreateURLParserCtxt (in /opt/local/lib/libxml2.2.dylib)
   by 0x10005B56B: xmlReadFile (in /opt/local/lib/libxml2.2.dylib)
   by 0x100000E03: main.omp_fn.0 (in ./xtest)
   by 0x100028FA3: gomp_thread_start (in /opt/local/lib/gcc44/libgomp.1.dylib)
   by 0x1001E8535: _pthread_start (in /usr/lib/libSystem.B.dylib)
   by 0x1001E83E8: thread_start (in /usr/lib/libSystem.B.dylib)

LEAK SUMMARY:
   definitely lost: 968 bytes in 1 blocks
   indirectly lost: 0 bytes in 0 blocks
     possibly lost: 0 bytes in 0 blocks
   still reachable: 8,032 bytes in 7 blocks
        suppressed: 0 bytes in 0 blocks
Reachable blocks (those to which a pointer was found) are not shown.
To see them, rerun with: --leak-check=full --show-reachable=yes

This is working for me whatever the XML document used. I am using libxml 2.7.8, on Mac OS X 10.6.5 with gcc 4.4.5.

Is someone able to reproduce this bug ?

Thanks,

Antonin

like image 996
Antonin Portelli Avatar asked Jan 06 '11 16:01

Antonin Portelli


2 Answers

From the web site you listed above (http://xmlsoft.org/threads.html):

Starting with 2.4.7, libxml2 makes provisions to ensure that concurrent threads can safely work in parallel parsing different documents.

Your example seems to be using an xmlReadFile for the same document (testdoc.xml) for each thread. It further states:

Note that the thread safety cannot be ensured for multiple threads sharing the same document, the locking must be done at the application level ...

like image 150
ejd Avatar answered Oct 18 '22 09:10

ejd


You should probably bring this up on the libxml2 mailing list.

http://mail.gnome.org/mailman/listinfo/xml

like image 34
dgatwood Avatar answered Oct 18 '22 09:10

dgatwood