Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Separate functions into validation and implementation? Why?

I was reading a C# book in which the author (some dude named Jon Skeet) implements a Where function like

public static IEnumerable<T> Where<T> ( this IEnumerable<T> source, Funct<T,bool> predicate ) 
{
    if ( source == null || predicate == null ) 
    {
        throw new ArgumentNullException();
    }
    return WhereImpl(source, predicate);
}

public static IEnumerable<T> WhereImpl<T> ( IEnumerable <T> source, Func<T,bool> predicate ) 
{
    foreach ( T item in source ) 
    {
      if ( predicate(item) )  
      {
         yield return item;
      }
    }

}

Now, I fully understand how this works and that it's equivalent to

public static IEnumerable<T> Where<T> ( this IEnumerable<T> source, Funct<T,bool> predicate ) 
{
    if ( source == null || predicate == null ) 
    {
        throw new ArgumentNullException();
    }
    foreach ( T item in source ) 
    {
      if ( predicate(item) )  
      {
         yield return item;
      }
    }
}

which brings up the question of why would one separate these into 2 functions given that there would be memory/time overhead and of course more code. I always validate parameters and if I start writing like this example then I'll be writing twice as much code. Is there some school of thought which holds that validation and implementation should be separate functions?

like image 511
Subpar Web Dev Avatar asked Dec 09 '15 19:12

Subpar Web Dev


3 Answers

The reason is that the iterator block is always lazy. Unless you call GetEnumerator() and then MoveNext(), the code in the method won't get executed.

In other words, consider this call to your "equivalent" method:

var ignored = OtherEnumerable.Where<string>(null, null);

No exception is thrown, because you're not calling GetEnumerator() and then MoveNext(). Compare that with my version where the exception is thrown immediately regardless of how the return value is used... because it only calls the method with the iterator block after validating eagerly.

Note that async/await has similar issues - if you have:

public async Task FooAsync(string x)
{
    if (x == null)
    {
        throw new ArgumentNullException(nameof(x));
    }
    // Do some stuff including awaiting
}

If you call this, you'll end up getting a faulted Task - rather than a NullReferenceException being thrown. If you await the returned Task, then the exception will be thrown, but that may not be where you called the method. That's okay in most cases, but worth knowing about.

like image 97
Jon Skeet Avatar answered Nov 07 '22 19:11

Jon Skeet


It may depend of the scenario and your coding style. Jon Skeet is absolutely right about why they should be separated when you're using yield to create iterators.

BTW, I thought it could be interesting adding my two cents here: same code using Code Contracts (i.e. design by contract) behaves in a different way.

Pre-conditions aren't part of the iterator block, thus, the following code will throw a contract exception immediately if the whole pre-conditions aren't met:

public static class Test
{
    public static IEnumerable<T> Where<T>(this IEnumerable<T> source, Func<T, bool> predicate)
    {
        Contract.Requires(source != null);
        Contract.Requires(predicate != null);

        foreach (T item in source)
        {
            if (predicate(item))
            {
                yield return item;
            }
        }
    }
}

// This throws a contract exception directly, no need of 
// enumerating the returned enumerable
Test.Where<string>(null, null);
like image 27
Matías Fidemraizer Avatar answered Nov 07 '22 19:11

Matías Fidemraizer


A method using yield return looks very nice and simple, but if you examine the compiled code you will notice it gets quite complicated.

The compiler generates a new class for you with state machine logic to support the enumeration. For the 2nd Where method, it's around 160 lines of code after decompilation. The actual Where method is compiled to

[IteratorStateMachine(typeof(IterarorTest.<Where>d__0<>))]
public static IEnumerable<T> Where<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
    IterarorTest.<Where>d__0<T> expr_07 = new IterarorTest.<Where>d__0<T>(-2);
    expr_07.<>3__source = source;
    expr_07.<>3__predicate = predicate;
    return expr_07;
}

As you can see, no arguments are checked in this method. It just returns a new iterator.

The arguments are checked in the auto-generated class' MoveNext method (the code is a bit too long to post here).

On the other hand, if you move the yield return to another method, the arguments are checked immediately when you call the Where method - which is the expected behavior here.

Edit

As noticed by Matias Fidemraizer, code contracts also solve the problem - the contract checks are inserted in the Where method

public static IEnumerable<T> Where<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
    __ContractsRuntime.Requires(source != null, null, "source != null");
    __ContractsRuntime.Requires(predicate != null, null, "predicate != null");
    IterarorTest.<Where>d__0<T> expr_27 = new IterarorTest.<Where>d__0<T>(-2);
    expr_27.<>3__source = source;
    expr_27.<>3__predicate = predicate;
    return expr_27;
}
like image 1
Jakub Lortz Avatar answered Nov 07 '22 17:11

Jakub Lortz