Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a deferred IEnumerable<T> into two sequences without re-evaluation?

I have a method that needs to process an incoming sequence of commands and split the results into different buckets depending on some properties of the result. For example:

class Pets
{
    public IEnumerable<Cat> Cats { get; set; }
    public IEnumerable<Dog> Dogs { get; set; }
}

Pets GetPets(IEnumerable<PetRequest> requests) { ... }

The underlying model is perfectly capable of handling the entire sequence of PetRequest elements at once, and also the PetRequest is mostly generic information like an ID, so it makes no sense to try to split the requests at the input. But the provider doesn't actually give back Cat and Dog instances, just a generic data structure:

class PetProvider
{
    IEnumerable<PetData> GetPets(IEnumerable<PetRequest> requests)
    {
        return HandleAllRequests(requests);
    }
}

I've named the response type PetData instead of Pet to clearly indicate that it is not a superclass of Cat or Dog - in other words, conversion to Cat or Dog is a mapping process. The other thing to keep in mind is that HandleAllRequests is expensive, e.g. a database query, so I really don't want to repeat it, and I would prefer to avoid caching the results in memory using ToArray() or the like, because there might be thousands or millions of results (I have a lot of pets).

So far I've been able to throw together this clumsy hack:

Pets GetPets(IEnumerable<PetRequest> requests)
{
    var data = petProvider.GetPets(requests);
    var dataGroups = 
        from d in data
        group d by d.Sound into g
        select new { Sound = g.Key, PetData = g };
    IEnumerable<Cat> cats = null;
    IEnumerable<Dog> dogs = null;
    foreach (var g in dataGroups)
        if (g.Sound == "Bark")
            dogs = g.PetData.Select(d => ConvertDog(d));
        else if (g.Sound == "Meow")
            cats = g.PetData.Select(d => ConvertCat(d));
    return new Pets { Cats = cats, Dogs = dogs };
}

This technically works, in the sense that it doesn't cause the PetData results to be enumerated twice, but it has two major problems:

  1. It looks like a giant pimple on the code; it smacks of the awful imperative style we always used to have to employ in the pre-LINQ framework 2.0.

  2. It ends up being a thoroughly pointless exercise, because the GroupBy method is just caching all those results in memory, which means I'm really no better off than if I'd just been lazy and done a ToList() in the first place and attached a few predicates.

So to restate the question:

Is it possible to split a single deferred IEnumerable<T> instance into two IEnumerable<?> instances, without performing any eager evaluations, caching results in memory, or having to re-evaluate the original IEnumerable<T> a second time?

Basically, this would be the reverse of a Concat operation. The fact that there isn't already one in the .NET framework is a strong indication that this may not even be possible, but I thought it wouldn't hurt to ask anyway.

P.S. Please don't tell me to create a Pet superclass and just return an IEnumerable<Pet>. I used Cat and Dog as fun examples, but in reality the result types are more like Item and Error - they are both derived from the same generic data but otherwise have nothing in common at all.

like image 301
Aaronaught Avatar asked Jun 14 '11 16:06

Aaronaught


2 Answers

Fundamentally, no. Imagine if it were possible. Then consider what happens if I do:

foreach (Cat cat in pets.Cats)
{
    ...
}

foreach (Dog dog in pets.Dogs)
{
    ...
}

That needs to handle all the cats first, and then all the dogs... so what could happen with the original sequence if the first element is a Dog? It either has to cache it or skip it - it can't return it, because we're still asking for Cats.

You could implement something which only caches as much as it needs to, but that's likely to be the whole of one sequence, as typical usage is to completely evaluate one sequence or the other.

If at all possible, you really just want to handle pets (whether cats or dogs) as you fetch them. Would it be feasible to provide an Action<Cat> and an Action<Pet> and execute the right handler for each item?

like image 138
Jon Skeet Avatar answered Nov 10 '22 18:11

Jon Skeet


What Jon said (I'm sure I'm the 1 millionth person to say that).

I'd probably just go old-school and do:

List<Cat> cats = new List<Cat>();
List<Dog> dog = new List<Dog>();

foreach(var pet in data)
{
   if (g.Sound == "Bark")
     dogs.Add(ConvertDog(pet));
   else if (pet.Sound == "Meow")
     cats.Add(ConvertCat(pet));
}

But I realise this is not exactly what you want to do - but then you did say re-evaluation - and this does only evaluate once :)

like image 39
Andras Zoltan Avatar answered Nov 10 '22 19:11

Andras Zoltan