Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Populating FosElasticaBundle running out of php memory, possible memory leak?

I've installed FOSElasticaBundle and have it working with a cross section of my data.

My problem arises in that I have about 14m rows that I need to use to build an index. I ran the populate command and after about 6 hours yesterday it errored out at 10.8% with a memory error:

PHP Fatal error:  Allowed memory size of 2147483648 bytes exhausted (tried to allocate 52277 bytes) in /var/www/html/vendor/monolog/monolog/src/Monolog/Formatter/LineFormatter.php on line 111

As you can see I've set my php memory limit to 2G which should be quite excessive.

The last line before the error looked like

Populating index/entity, 10.8% (1315300/12186320), 36 objects/s (RAM : current=2045Mo peak=2047Mo)

And the current and peak were ticking up with every line, starting around 30mb.

My assumption here is that there is some sort of memory leak? Surely php's memory shouldn't be exhausted by this process. I've also tried the command with some extra parameters

app/console fos:elastica:populate --no-debug --no-reset --env=prod

but as I watch it running the current memory is still ticking up.

Any thoughts on what might be going on here and what I can do to debug it? I found this discussion which sounds like my problem, but doesn't really present a good solution: https://github.com/FriendsOfSymfony/FOSElasticaBundle/issues/82. I'm using doctrine and the default provider.

Thank you-

like image 523
Pez Avatar asked Jun 27 '14 12:06

Pez


2 Answers

I'm not able to solve the memory leak entirely, but by running the command

app/console fos:elastica:populate --no-debug --no-reset --env=prod --offset=n

I've been able to populate in batches. I drastically cut down the amount of memory leaking by turning off the logger, using a solution on this page

https://github.com/FriendsOfSymfony/FOSElasticaBundle/issues/273

Setting my php memory_limit to 4G (!) I'm able to get more than 5m records populated without error, and thus after a couple of batches I should be done with this process.

Most solutions seem to involve writing a custom provider (see https://github.com/FriendsOfSymfony/FOSElasticaBundle/issues/457) but through a ridiculous memory_limit and limiting the memory leak as much as possible I didn't need to.

like image 58
Pez Avatar answered Oct 13 '22 00:10

Pez


The main problem here that everything is done in one process, all entities have to load in memory. It is done by chunks but still, it loads all the data. There is much you can do with it cuz the problem in the design.

The solution: The data could be split into chunks which are processed in separate processes in parallel. The worker processes may quit from time to time (they have to be restarted by Supervisord or similar tool) freeing the memory and resources. As a result, you'll get a lot better performance and better fault tolerance and less memory footprint.

There are many ways to implement this (using forks, pthreads or message queues) but I personally suggest looking at enqueue/elastica-bundle. It improves populate command by splitting the job and sending the messages.

like image 41
Maksim Kotlyar Avatar answered Oct 13 '22 00:10

Maksim Kotlyar