Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory leak with PyYAML

I think that I'm having a memory leak when loading an .yml file with the library PyYAML.

I've followed the next steps:

import yaml
d = yaml.load(open(filename, 'r'))

The memory used by the process (I've gotten it with top or htop) has grown from 60K to 160M while the size of the file is lower than 1M.

Then, I've done the next command:

sys.getsizeof(d)

And it has returned a value lower than 400K.

I've also tried to use the garbage collector with gc.collect(), but nothing has happened.

As you can see, it seems that there's a memory leak, but I don't know what is producing it, neither I know how to free this amount of memory.

Any idea?

like image 725
joanlopez Avatar asked Oct 30 '22 01:10

joanlopez


1 Answers

Your approach doesn't show a memory leak, it just shows that PyYAML uses a lot of memory while processing a moderately sized YAML file.

If you would do:

import yaml
X = 10
for x in range(X):
    d = yaml.safe_load(open(filename, 'r'))

And the memory size used by the program would change depending on what you set X to, then there is reason to assume there is a memory leak.

In tests that I ran this is not the case. It is just that the default Loader and SafeLoader take about 330x the filesize in memory (based on an arbitrary 1Mb size simple, i.e. no tags, YAML file) and the CLoader about 145x that filesize.

Loading the YAML data multiple times doesn't increase that, so load() gives back the memory it uses, which means there is no memory leak.

That is not to say that it looks like an enormous amount of overhead.

(I am using safe_load() as PyYAML's documentation indicate that load() is not safe on uncontrolled input files).

like image 94
Anthon Avatar answered Nov 15 '22 05:11

Anthon