I am experiencing problem in my application with OutOfMemoryException. My application can search for words within texts. When I start a long running process search to search about 2000 different texts for about 2175 different words the application will terminate at about 50 % through with a OutOfMemoryException (after about 6 hours of processing)
I have been trying to find the memory leak. I have an object graph like: (--> are references)
a static global application object (controller) --> an algorithm starter object --> text mining starter object --> text mining algorithm object (this object performs the searching).
The text mining starter object will start the text mining algorithm object's run()-method in a separate thread.
To try to fix the issue I have edited the code so that the text mining starter object will split the texts to search into several groups and initialize one text mining algorithm object for each group of texts sequentially (so when one text mining algorithm object is finished a new will be created to search the next group of texts). Here I set the previous text mining algorithm object to null. But this does not solve the issue.
When I create a new text mining algorithm object I have to give it some parameters. These are taken from properties of the previous text mining algorithm object before I set that object to null. Will this prevent garbage collection of the text mining algorithm object?
Here is the code for the creation of new text mining algorithm objects by the text mining algorithm starter:
private void RunSeveralAlgorithmObjects()
{
IEnumerable<ILexiconEntry> currentEntries = allLexiconEntries.GetGroup(intCurrentAlgorithmObject, intNumberOfAlgorithmObjectsToUse);
algorithm.LexiconEntries = currentEntries;
algorithm.Run();
intCurrentAlgorithmObject++;
for (int i = 0; i < intNumberOfAlgorithmObjectsToUse - 1; i++)
{
algorithm = CreateNewAlgorithmObject();
AddAlgorithmListeners();
algorithm.Run();
intCurrentAlgorithmObject++;
}
}
private TextMiningAlgorithm CreateNewAlgorithmObject()
{
TextMiningAlgorithm newAlg = new TextMiningAlgorithm();
newAlg.SortedTermStruct = algorithm.SortedTermStruct;
newAlg.PreprocessedSynonyms = algorithm.PreprocessedSynonyms;
newAlg.DistanceMeasure = algorithm.DistanceMeasure;
newAlg.HitComparerMethod = algorithm.HitComparerMethod;
newAlg.LexiconEntries = allLexiconEntries.GetGroup(intCurrentAlgorithmObject, intNumberOfAlgorithmObjectsToUse);
newAlg.MaxTermPercentageDeviation = algorithm.MaxTermPercentageDeviation;
newAlg.MaxWordPercentageDeviation = algorithm.MaxWordPercentageDeviation;
newAlg.MinWordsPercentageHit = algorithm.MinWordsPercentageHit;
newAlg.NumberOfThreads = algorithm.NumberOfThreads;
newAlg.PermutationType = algorithm.PermutationType;
newAlg.RemoveStopWords = algorithm.RemoveStopWords;
newAlg.RestrictPartialTextMatches = algorithm.RestrictPartialTextMatches;
newAlg.Soundex = algorithm.Soundex;
newAlg.Stemming = algorithm.Stemming;
newAlg.StopWords = algorithm.StopWords;
newAlg.Synonyms = algorithm.Synonyms;
newAlg.Terms = algorithm.Terms;
newAlg.UseSynonyms = algorithm.UseSynonyms;
algorithm = null;
return newAlg;
}
Here is the start of the thread that is running the whole search process:
// Run the algorithm in it's own thread
Thread algorithmThread = new Thread(new ThreadStart
(RunSeveralAlgorithmObjects));
algorithmThread.Start();
Can something here prevent the previous text mining algorithm object from being garbage collected?
Excessive garbage collection activity can occur due to a memory leak in the Java application. Insufficient memory allocation to the JVM can also result in increased garbage collection activity. And when excessive garbage collection activity happens, it often manifests as increased CPU usage of the JVM!
The finalize() method is called by Garbage Collector, not JVM. However, Garbage Collector is one of the modules of JVM. Object class finalize() method has an empty implementation. Thus, it is recommended to override the finalize() method to dispose of system resources or perform other cleanups.
It is controlled by a thread known as Garbage Collector. Java provides two methods System. gc() and Runtime. gc() that sends request to the JVM for garbage collection.
Garbage Collection is process of reclaiming the runtime unused memory automatically. In other words, it is a way to destroy the unused objects. To do so, we were using free() function in C language and delete() in C++. But, in java it is performed automatically.
I recommend first identifying what exactly is leaking. Then postulate a cause (such as references in event handlers).
To identify what is leaking:
Properties -> Debug ->
check Enable unmanaged code debugging
.Debug -> Break All
.Debug -> Windows -> Immediate
Type one of the following into the immediate window, depending on whether you're running 32 or 64 bit, .NET 2.0/3.0/3.5 or .NET 4.0:
.load C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\sos.dll
for 32-bit .NET 2.0-3.5
.load C:\WINDOWS\Microsoft.NET\Framework\v4.0.30319\sos.dll
for 32-bit .NET 4.0
.load C:\WINDOWS\Microsoft.NET\Framework64\v2.0.50727\sos.dll
for 64-bit .NET 2.0-3.5
.load C:\WINDOWS\Microsoft.NET\Framework64\v4.0.30319\sos.dll
for 64-bit .NET 4.0
You can now run SoS commands in the Immediate window. I recommend checking the output of !dumpheap -stat
, and if that doesn't pinpoint the problem, check !finalizequeue
.
Notes:
!
(exclamation point).These instructions are courtesy of the incredible Tess from Microsoft, and Mario Hewardt, author of Advanced .NET Debugging.
Once you've identified the leak in terms of which object is leaking, then postulate a cause and implement a fix. Then you can do these steps again to determine for sure whether or not the fix worked.
1) As I said in a comment, if you use events in your code (the AddAlgorithmListeners
makes me suspect this), subscribing to an event can create a "hidden" dependency between objects which is easily forgotten. This dependency can mean that an object is not freed, because someone is still listening to one of it's events. Make sure you unsubscribe from all events when you no longer need to listen to them.
2) Also, I'd like to point you to one (probably not-so-off-topic) issue with your code:
private void RunSeveralAlgorithmObjects()
{
...
algorithm.LexiconEntries = currentEntries;
// ^ when/where is algorithm initialized?
for (...)
{
algorithm = CreateNewAlgorithmObject();
....
}
}
Is algoritm
already initialized when this method is invoked? Otherwise, setting algorithm.LexiconEntries
wouldn't seem like a valid thing to do. This means your method is dependent on some external state, which to me looks like a potential place for bugs creeping in your program logic.
If I understand it correctly, this object contains some state describing the algorithm, and CreateNewAlgorithmObject
derives a new state for algorithm
from the current state. If this was my code, I would make algorithm
an explicit parameter to all your functions, as a signal that the method depends on this object. It would then no longer be hidden "external" state upon which your functions depend.
P.S.: If you don't want to go down that route, the other thing you could consider to make your code more consistent is to turn CreateNewAlgorithmObject
into a void
method and re-assign algorithm
directly inside that method.
Is AddAlgorithmListeners attaching event handlers to events exposed by the algorithm object ? Are the listening objects living longer than the algorithm object - in which case they can continue to keep the algorithm object from being collected.
If yes, try unsubscribing events before you let the object go out of scope.
for (int i = 0; i < intNumberOfAlgorithmObjectsToUse - 1; i++)
{
algorithm = CreateNewAlgorithmObject();
AddAlgorithmListeners();
algorithm.Run();
RemoveAlgoritmListeners(); // See if this fixes your issue.
intCurrentAlgorithmObject++;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With