If I'm trying to filter results at multiple levels of an IEnumerable<T>
object graph, is there a preferred way of chaining extension methods to do this?
I'm open to any extension method and lambda usage, but I'd prefer not to use LINQ syntax to remain consistent with the rest of the codebase.
Is it better to push the filtering to the selector
of the SelectMany()
method or just to chain another Where()
method? Or is there a better solution?
How would I go about identifying the best option? In this test case, everything is directly available in memory. Obviously both samples below are currently producing the same correct results; I'm just looking for a reason one or the other (or another option) would be preferred.
public class Test
{
// I want the first chapter of a book that's exactly 42 pages, written by
// an author whose name is Adams, from a library in London.
public Chapter TestingIEnumerableTExtensionMethods()
{
List<Library> libraries = GetLibraries();
Chapter chapter = libraries
.Where(lib => lib.City == "London")
.SelectMany(lib => lib.Books)
.Where(b => b.Author == "Adams")
.SelectMany(b => b.Chapters)
.First(c => c.NumberOfPages == 42);
Chapter chapter2 = libraries
.Where(lib => lib.City == "London")
.SelectMany(lib => lib.Books.Where(b => b.Author == "Adams"))
.SelectMany(b => b.Chapters.Where(c => c.NumberOfPages == 42))
.First();
}
And here's the sample object graph:
public class Library
{
public string Name { get; set; }
public string City { get; set; }
public List<Book> Books { get; set; }
}
public class Book
{
public string Name { get; set; }
public string Author { get; set; }
public List<Chapter> Chapters { get; set; }
}
public class Chapter
{
public string Name { get; set; }
public int NumberOfPages { get; set; }
}
Which is best likely varies based on the LINQ implementation you're using. LinqToSql will behave differently from in-memory filtering. The order of the clauses should impact the performance depending on what data is used, since naive implementations will filter more records earlier in the sequence meaning less work for the later methods.
For your two examples, I would guess that the performance difference is negligible and would favor the first since it allows easier modification of each clause independent of the others.
As for determining the best option, it's the same as anything else: measure.
I'm guessing the first expression you have will be slightly but insignificantly faster. To really determine if one or the other is faster, you will need to time them, with a profiler or Stopwatch.
The readability doesn't seem to be strongly affected either way. I prefer the first approach, as it has less levels of nesting. It all depends on your personal preference.
It depends on how the underlying LINQ provider works. For LINQ to Objects, both in this case would require about the same amount of work, more or less. But that's the most straightforward (simplest) example, so beyond that it's hard to say.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With