The following code takes 5 minutes to run on 100,000 rows. That seems crazy to me. What am I doing wrong?
var query =
from foo in fooStuff.AsEnumerable()
group foo by foo.Field<Int64>("FooID") into g
select new
{
FooID = g.Key,
FooTier = g.Min(foo => foo.Field<int>("Tier"))
};
Note: On Mono.
You are materializing all the entities when you call AsEnumerable(), so your grouping is being done in memory. Try removing that part so that the grouping is done at the database level:
var query =
from foo in fooStuff
group foo by foo.FooID into g
select new
{
FooID = g.Key,
FooTier = g.Min(foo => foo.Tier)
};
It is not a direct comparision and isn't on Mono, but I have some code which does something similar with a 6MB xml file which I read into a DataSet and it has 30,000 rows and takes 0.5 seconds, so I don't think it is the groupby itself that causes the problem.
To diagnose further, I would suggest
Testing how long it takes to read the information into a list, ie
var fooList = fooStuff.AsEnumerable().ToList();
Testing how long it takes if you change the query to use fooList instead of fooStuff
Testing how long it takes if you remove FooTier = g.Min(foo => foo.Tier) from the select
Separate the .Field<> reflection from the groupby and time each section, ie first read the information from the datatable into a list , eg
var list2 =
(from foo in fooStuff.AsEnumerable()
select new {
FooID = foo.Field<Int64>("FooID")
Tier = foo.Field<int>("Tier")
}).ToList();
Then you can query this list
var query =
from foo in list2
group foo by foo.FooID into g
select new
{
FooID = g.Key,
FooTier = g.Min(foo => foo.Tier)
};
var results = query.ToList();
If this query is slow, it would suggest that there is something wrong with mono's implementation of GroupBy. You might be able to verify that by using something like this
public static Dictionary<TKey, List<TSrc>> TestGroupBy<TSrc, TKey>
(this IEnumerable<TSrc> src, Func<TSrc,TKey> groupFunc)
{
var dict= new Dictionary<TKey, List<TSrc>>();
foreach (TSrc s in src)
{
TKey key = groupFunc(s);
List<TSrc> list ;
if (!dict.TryGetValue(key, out list))
{
list = new List<TSrc>();
dict.Add(key, list);
}
list.Add(s);
}
return dict;
}
And to use it
var results = list2.TestGroupBy(r=>r.FooID)
.Select(r=> new { FooID = r.Key, FooTier = r.Value.Min(r1=>r1.Tier)} );
Note, this is not meant as a replacement for groupby and does not cope with null keys but should be enough to determine if their is a problem with groupby (assuming mono's implementation of Dictionary and List are ok).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With