Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Huge memory leak with libxml2

Tags:

c

libxml2

I'm coding a XML parser with libxml2. Actually, I finished it but there is a pretty annoying problem of memory. The program firstly get some links from my database and all those links point to a XML file. I use curl to download them. The process is simple : I download a file, then I parse it, and so on...

The problem seems to be when a parsing is finished. Curl downloads the next file but it seems that the previous XML is not freed, because I guess libxml2 loads it in RAM. When parsing the last XML, I find myself with a ~2.6GB of leak (yeah, some of these file are really big...) and my machine only has 4GB of RAM. It works for the moment, but in the future, more links will be added to the database, so I must fix it now.

My code is very basic:

xmlDocPtr doc;
doc = xmlParseFile("data.xml");

/* code to parse the file... */

xmlFreeDoc(doc);

I tried using:

xmlCleanupParser();

but the doc says : "It doesn't deallocate any document related memory." (http://xmlsoft.org/html/libxml-parser.html#xmlCleanupParser)

So, my question is : Does somebody know how to deallocate all this document related memory ?

like image 226
Pwet Avatar asked May 20 '13 22:05

Pwet


People also ask

What causes memory leaks?

Memory leaks occur when new memory is allocated dynamically and never deallocated. In C programs, new memory is allocated by the malloc or calloc functions, and deallocated by the free function. In C++, new memory is usually allocated by the new operator and deallocated by the delete or the delete [] operator.

Can memory leak be exploited?

Most memory leaks result in general software reliability problems, but if an attacker can intentionally trigger a memory leak, the attacker might be able to launch a denial of service attack (by crashing the program) or take advantage of other unexpected program behavior resulting from a low memory condition [1].

How do you manage memory leaks?

Use reference objects to avoid memory leaks ref package, you can work with the garbage collector in your program. This allows you to avoid directly referencing objects and use special reference objects that the garbage collector easily clears. The special subclasses allow you to refer to objects indirectly.

What is memory leak in VM?

In computer science, a memory leak is a type of resource leak that occurs when a computer program incorrectly manages memory allocations in a way that memory which is no longer needed is not released.


2 Answers

The problem is that you are looking at the statistics in the wrong way...

When a program starts it allocates some memory from the OS for the heap. When it does malloc (or similar function) the C runtime takes slices from that heap until it runs out. After that, it automatically asks the OS for more memory, maybe each time in greater blocks. When the program does free it marks the freed memory as available for further mallocs, but it will not return the memory to the OS.

You may think that this behavior is wrong, that the program is leaking, but it is not: the freed memory is accounted for, just not in the OS but in the C library layer of your application. Proof to that is that the memory for the second XML file does not add to the first one: it will only be noticeable if it is the greatest file yet.

You may also think that if this memory is not used any longer by this program, it is just wasted there and it cannot be used for other processes. But that's not true: if the memory is not touched in a while and it is needed elsewhere, the OS Virtual Memory Manager will swap it out and reuse it.

So, my guess is that actually you don't have a problem.

PS: What I've just described is not always true. Particularly many C libraries make a distinction between small and large memory chunks and allocate them differently.

like image 108
rodrigo Avatar answered Sep 21 '22 15:09

rodrigo


Late in the game but just found this post today. It could be useful for other readers too.

If you are parsing or generating large documents, you may consider the XmlReader and XmlReader APIs. The drastically reduce memory usage, actually almost constant usage no matter how large the input is.

http://xmlsoft.org/html/libxml-xmlreader.html http://xmlsoft.org/html/libxml-xmlwriter.html

like image 41
Pierre Avatar answered Sep 19 '22 15:09

Pierre