I have written a program which analyzes a project's source code and reports various issues and metrics based on the code.
To analyze the source code, I load the code files that exist in the project's directory structure and analyze the code from memory. The code goes through extensive processing before it is passed to other methods to be analyzed further.
The code is passed around to several classes when it is processed.
The other day I was running it on one of the larger project my group has, and my program crapped out on me because there was too much source code loaded into memory. This is a corner case at this point, but I want to be able to handle this issue in the future.
What would be the best way to avoid memory issues?
I'm thinking about loading the code, do the initial processing of the file, then serialize the results to disk, so that when I need to access them again, I do not have to go through the process of manipulating the raw code again. Does this make sense? Or is the serialization/deserialization more expensive then processing the code again?
I want to keep a reasonable level of performance while addressing this problem. Most of the time, the source code will fit into memory without issue, so is there a way to only "page" my information when I am low on memory? Is there a way to tell when my application is running low on memory?
Update: The problem is not that a single file fills memory, its all of the files in memory at once fill memory. My current idea is to rotate off the disk drive when I process them
Rehearsal is found to be the most frequently used strategy, followed by mental imagery, elaboration, mnemonics, and organization. Previous study also found that rehearsal is the memory strategy taught most often by teachers to their students (Moely et al., 1992).
Another way to decrease the memory usage of our data is to truncate numerical items in the data. For example, whenever we load a CSV into a column in a data frame, if the file contains numbers, it will store it as which takes 64 bytes to store one numerical value.
1.6GB is still manageable and by itself should not cause memory problems. Inefficient string operations might do it.
As you parse the source code your probably split it apart into certain substrings - tokens or whatver you call them. If your tokens combined account for entire source code, that doubles memory consumption right there. Depending on the complexity of the processing you do the mutiplier can be even bigger. My first move here would be to have a closer look on how you use your strings and find a way to optimize it - i.e. discarding the origianl after the first pass, compress the whitespaces, or use indexes (pointers) to the original strings rather than actual substrings - there is a number of techniques which can be useful here.
If none of this would help than I would resort to swapping them to and fro the disk
If the problem is that a single copy of your code causing you to fill the memory available then there are atleast two options.
You should also check if you are disposing of objects properly. Do you have memory problems due to old copies of objects being in memory?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With