I think that I'm having a memory leak when loading an .yml file
with the library PyYAML.
I've followed the next steps:
import yaml
d = yaml.load(open(filename, 'r'))
The memory used by the process (I've gotten it with top
or htop
) has grown from 60K
to 160M
while the size of the file is lower than 1M
.
Then, I've done the next command:
sys.getsizeof(d)
And it has returned a value lower than 400K
.
I've also tried to use the garbage collector with gc.collect()
, but nothing has happened.
As you can see, it seems that there's a memory leak, but I don't know what is producing it, neither I know how to free this amount of memory.
Any idea?
Your approach doesn't show a memory leak, it just shows that PyYAML uses a lot of memory while processing a moderately sized YAML file.
If you would do:
import yaml
X = 10
for x in range(X):
d = yaml.safe_load(open(filename, 'r'))
And the memory size used by the program would change depending on what you set X
to, then there is reason to assume there is a memory leak.
In tests that I ran this is not the case. It is just that the default Loader and SafeLoader take about 330x the filesize in memory (based on an arbitrary 1Mb size simple, i.e. no tags, YAML file) and the CLoader about 145x that filesize.
Loading the YAML data multiple times doesn't increase that, so load()
gives back the memory it uses, which means there is no memory leak.
That is not to say that it looks like an enormous amount of overhead.
(I am using safe_load()
as PyYAML's documentation indicate that load()
is not safe on uncontrolled input files).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With