Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimize nested loop

I'm having a memory issue with my application with a nested for loop and I can't figure out how to improve it. I've tried using linq, but I guess that internally it's the same, because the memory leaks still is there.

EDIT: As I've been requested, I'll provide more information about my problem.

I've got all of my customers (about 400.000) indexed in a Lucene document store. Each customer can be present in more than one agency, exiting some of them than can be in 200-300 agencies.

I need to retrieve all of my customers from the 'global' customer index and build a separate index for each agency, only containing the customers it can see. There are some business rules and security rules that need to be applied to each agency index, so right now, I can't afford to maintain a single customer index for all my agencies.

My process looks like this:

int numDocuments = 400000;

// Get a Lucene Index Searcher from an Index Factory
IndexSearcher searcher = SearcherFactory.Instance.GetSearcher(Enums.CUSTOMER);

// Builds a query that gets everything in the index
Query query = QueryHelper.GetEverythingQuery();
Filter filter = new CachingWrapperFilter(new QueryWrapperFilter(query));

// Sorts by Agency Id
SortField sortField = new SortField("AgencyId, SortField.LONG);
Sort sort = new Sort(sortField);

TopDocs documents = searcher.Search(query, filter, numDocuments, sort);

for (int i = 0; i < numDocuments; i++)
{
     Document document = searcher.Doc(documents.scoreDocs[i].doc);

     // Builds a customer object from the lucene document
     Customer customer = new Customer(document);

     // If this nested loop is removed, the memory doesn't grow
     foreach(Agency agency in customer.Agencies)
     {
          // Gets a writer from a factory for the agency id.
          IndexWriter writer = WriterFactory.Instance.GetWriter(agency.Id);

          // Builds an agency-specific document from the customer
          Document customerDocument = customer.GetAgencyDocument(agency.Id);

          // Adds the document to the agency's lucene index
          writer.AddDocument(customerDocument);
     }
}

EDIT: The solution

The problem was I wasn't reusing the instances of the "Document" object in the inner loop, and that caused an indecent grow of memory usage of my service. Just reusing a single instance of Document for the full process solved my problem.

Thanks everyone.

like image 681
dandel Avatar asked Dec 20 '22 23:12

dandel


2 Answers

What I believe to be happening here is:

You have too much object creation inside the loops. If at all possible do no use the new() keyword inside the loops. Initialize objects that are reusable across the loops and pass them data to work on. DO not construct new objects inside that many loops because garbage collection will become a serious problem and the garbage collector may not be able to keep up with you, and will defer collection.

The first thing you can do to try if this is true, try to force garbage collection every X loops and wait for pending finalizers. If this brings memory down you know that this is the problem. And solving it is easy: just do not create new instances every loop iteration.

like image 80
Marino Šimić Avatar answered Jan 02 '23 03:01

Marino Šimić


First you should re-use your Document and Field instances that you pass to IndexWriter.AddDocument() to minimize memory usage and relieve pressure on the garbage collector.

• Re-use Document and Field instances As of Lucene 2.3 there are new setValue(...) methods that allow you to change the value of a Field. This allows you to re-use a single Field instance across many added documents, which can save substantial GC cost. It's best to create a single Document instance, then add multiple Field instances to it, but hold onto these Field instances and re-use them by changing their values for each added document. For example you might have an idField, bodyField, nameField, storedField1, etc. After the document is added, you then directly change the Field values (idField.setValue(...), etc), and then re-add your Document instance.

Note that you cannot re-use a single Field instance within a Document, and, you should not change a Field's value until the Document containing that Field has been added to the index.

http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

like image 31
Jf Beaulac Avatar answered Jan 02 '23 03:01

Jf Beaulac