Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is the best way to improve memory usage when you collect a large data set before processing it? (.NET)

Tags:

c#

.net

When I have to get GBs of data, save it on a collection and process it, I have memory overflows. So instead of:

 public class Program
 {
     public IEnumerable<SomeClass> GetObjects()
     {
         var list = new List<SomeClass>();
         while( // get implementation
             list.Add(object);
         }
         return list;
     }

     public void ProcessObjects(IEnumerable<SomeClass> objects)
     {
         foreach(var object in objects)
             // process implementation
     }

     void Main()
     {
         var objects = GetObjects();
         ProcessObjects(objects);
     }
 }

I need to:

 public class Program
 {
     void ProcessObject(SomeClass object)
     {
         // process implementation
     }

     public void GetAndProcessObjects()
     {
         var list = new List<SomeClass>();
         while( // get implementation
             Process(object);
         }
         return list;
     }

     void Main()
     {
         var objects = GetAndProcessObjects();
     }
 }

There is a better way?

like image 811
Jader Dias Avatar asked Dec 01 '22 12:12

Jader Dias


2 Answers

You ought to leverage C#'s iterator blocks and use the yield return statement to do something like this:

 public class Program
 {
     public IEnumerable<SomeClass> GetObjects()
     {
         while( // get implementation
             yield return object;
         }
     }

     public void ProcessObjects(IEnumerable<SomeClass> objects)
     {
         foreach(var object in objects)
             // process implementation
     }

     void Main()
     {
         var objects = GetObjects();
         ProcessObjects(objects);
     }
 }

This would allow you to stream each object and not keep the entire sequence in memory - you would only need to keep one object in memory at a time.

like image 85
Andrew Hare Avatar answered Dec 18 '22 10:12

Andrew Hare


Don't use a List, which requires all the data to be present in memory at once. Use IEnumerable<T> and produce the data on demand, or better, use IQueryable<T> and have the entire execution of the query deferred until the data are required.

Alternatively, don't keep the data in memory at all, but rather save the data to a database for processing. When processing is complete, then query the database for the results.

like image 37
John Saunders Avatar answered Dec 18 '22 11:12

John Saunders