I've installed FOSElasticaBundle and have it working with a cross section of my data.
My problem arises in that I have about 14m rows that I need to use to build an index. I ran the populate command and after about 6 hours yesterday it errored out at 10.8% with a memory error:
PHP Fatal error: Allowed memory size of 2147483648 bytes exhausted (tried to allocate 52277 bytes) in /var/www/html/vendor/monolog/monolog/src/Monolog/Formatter/LineFormatter.php on line 111
As you can see I've set my php memory limit to 2G which should be quite excessive.
The last line before the error looked like
Populating index/entity, 10.8% (1315300/12186320), 36 objects/s (RAM : current=2045Mo peak=2047Mo)
And the current and peak were ticking up with every line, starting around 30mb.
My assumption here is that there is some sort of memory leak? Surely php's memory shouldn't be exhausted by this process. I've also tried the command with some extra parameters
app/console fos:elastica:populate --no-debug --no-reset --env=prod
but as I watch it running the current memory is still ticking up.
Any thoughts on what might be going on here and what I can do to debug it? I found this discussion which sounds like my problem, but doesn't really present a good solution: https://github.com/FriendsOfSymfony/FOSElasticaBundle/issues/82. I'm using doctrine and the default provider.
Thank you-
I'm not able to solve the memory leak entirely, but by running the command
app/console fos:elastica:populate --no-debug --no-reset --env=prod --offset=n
I've been able to populate in batches. I drastically cut down the amount of memory leaking by turning off the logger, using a solution on this page
https://github.com/FriendsOfSymfony/FOSElasticaBundle/issues/273
Setting my php memory_limit to 4G (!) I'm able to get more than 5m records populated without error, and thus after a couple of batches I should be done with this process.
Most solutions seem to involve writing a custom provider (see https://github.com/FriendsOfSymfony/FOSElasticaBundle/issues/457) but through a ridiculous memory_limit and limiting the memory leak as much as possible I didn't need to.
The main problem here that everything is done in one process, all entities have to load in memory. It is done by chunks but still, it loads all the data. There is much you can do with it cuz the problem in the design.
The solution: The data could be split into chunks which are processed in separate processes in parallel. The worker processes may quit from time to time (they have to be restarted by Supervisord or similar tool) freeing the memory and resources. As a result, you'll get a lot better performance and better fault tolerance and less memory footprint.
There are many ways to implement this (using forks, pthreads or message queues) but I personally suggest looking at enqueue/elastica-bundle. It improves populate command by splitting the job and sending the messages.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With