I have to run data analysis on about 2 million lines of data and each line about 250 bytes long. So total about 500 megabytes of data. I am running latest Rakudo on Virtualbox Linux with 4G memory.
After about 8 hours, I got MoarVM panic due to running out of memory. How do I give more memory to MoarVM? Unfortunately I cannot break up the 2 millions into chunks and write to a files first because part of the data analysis requires the whole 2-m lines.
MoarVM doesn't have its own upper limit on memory (unlike, for example, the JVM). Rather, it gives an "out of memory" or "memory allocation failed" error only when memory is requested from the operating system and that request is refused. That may be because of configured memory limits, or it may really be that there just isn't that much available RAM/swap space to satisfy the request that was made (likely if you haven't configured limits).
It's hard to provide specific advice on what to try next given there's few details of the program in the question, but some things that might help are:
for $fh.lines { ... }
will only need to keep the Str
for the line currently being processed in memory, while my @lines = $fh.lines; for @lines { }
will keep all of the Str
objects around).:enc<ascii>
or similar when opening the file. This may lead to a smaller memory representation.my int8 @a
and store a million elements then it takes 1 MB of memory; do that with my @a
and they will all be boxed objects inside of a Scalar
container, which on a 64-bit machine that could eat over 70MB. Similar applies if you have an object that you make many instances of, and might be able to make some of the attributes native.I suggest you tackle your problem in several steps:
Prepare two small sample files if you haven't already. Keep them very small. I suggest a 2,000 lines long file and a 20,000 line long one. If you already have some sample files of around that length then those will do. Run your program for each file, noting how long each took and how much memory was used.
Update your question with your notes about duration and RAM use; plus links to your source code if that's possible and the sample files if that's possible.
Run the two sample files again but using the profiler as explained here. See what there is to see and update your question.
If you don't know how to do any of these things, ask in the comments.
If all the above is fairly easy, repeat for a 100,000 line file.
Then we should have enough data to give you better guidance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With