Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory optimized OrderBy and Take?

I have 9 GB of data, and I want only 10 rows. When I do:

 data.OrderBy(datum => datum.Column1)
     .Take(10)
     .ToArray();

I get an OutOfMemoryException. I would like to use an OrderByAndTake method, optimized for lower memory consumption. It's easy to write, but I guess someone already did. Where can I find it.

Edit: It's Linq-to-objects. The data comes from a file. Each row can be discarded if its value for Column1 is smaller than the current list of 10 biggest values.

like image 714
Jader Dias Avatar asked May 20 '11 18:05

Jader Dias


2 Answers

I'm assuming you're doing this in Linq to Objects. You could do something like...

var best = data
    .Aggregate(new List<T>(), (soFar, current) => soFar
                                                 .Concat(new [] { current })
                                                 .OrderBy(datum => datum.Column1)
                                                 .Take(10)
                                                 .ToList());

In this way, not all the items need to be kept in a new sorted collection, only the best 10 you're interested in.

This was the least code way. Since you know the soFar list is sorted, testing where/if to insert current could be optimized. I didn't feel like doing ALL the work for you. ;-)

PS: Replace T with whatever your type is.

EDIT: Thinking about it, the most efficient way would actually be a plain old foreach that compares each item to the running list of best 10.

like image 174
Jim Bolla Avatar answered Oct 19 '22 22:10

Jim Bolla


It figures: OrderBy is a Sort and that requires storing all the elements (deferred execution is cancelled).

It ought to work efficiently when data is an IQueryable, then it's up to the database.


  // just 4 fun
  public static IEnumerable<T> TakeDistinctMin<T, TKey>(this IEnumerable<T> @this, 
        int n, Func<T, TKey> selector)            
         where TKey: IComparable<TKey>
  {
        var tops = new SortedList<TKey, T>(n+1);

        foreach (var item in @this)
        {
            TKey k = selector(item);

            if (tops.ContainsKey(k))
                continue;

            if (tops.Count < n)
            {
                tops.Add(k, item);
            }
            else if (k.CompareTo(tops.Keys[tops.Count - 1]) < 0)
            {
                tops.Add(k, item);
                tops.RemoveAt(n);
            }                                    
        }

        return tops.Values;
    }
like image 20
Henk Holterman Avatar answered Oct 19 '22 22:10

Henk Holterman