Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Type system oddity: Enumerable.Cast<int>()

Tags:

c#

linq

Consider:

enum Foo
{
    Bar,
    Quux,
}

void Main()
{
    var enumValues = new[] { Foo.Bar, Foo.Quux, };
    Console.WriteLine(enumValues.GetType());         // output: Foo[]
    Console.WriteLine(enumValues.First().GetType()); // output: Foo

    var intValues = enumValues.Cast<int>();
    Console.WriteLine(intValues.GetType());         // output: Foo[] ???
    Console.WriteLine(intValues.First().GetType()); // output: Int32

    Console.WriteLine(ReferenceEquals(enumValues, intValues)); // true

    var intValuesArray = intValues.ToArray();
    Console.WriteLine(intValuesArray.GetType());         // output: Int32[]
    Console.WriteLine(intValuesArray.First().GetType()); // output: Int32

    Console.WriteLine(ReferenceEquals(intValues, intValuesArray)); // false
}

Note the third Console.WriteLine - I'm expecting it to print the type to which the array is being cast (Int32[]), but instead it prints the original type (Foo[])! And ReferenceEquals confirms that indeed, the first Cast<int> call is effectively a no-op.

So I peeked into the source of Enumerable.Cast and found the following:

public static IEnumerable<TResult> Cast<TResult>(this IEnumerable source) 
{
    IEnumerable<TResult> typedSource = source as IEnumerable<TResult>;
    if (typedSource != null) return typedSource;
    if (source == null) throw Error.ArgumentNull("source");
    return CastIterator<TResult>(source);
}

For our intents and purposes, the only thing that matters are the first two lines, because they're the only ones that get called. That means that the line:

var intValues = enumValues.Cast<int>();

is effectively translated into:

var intValues = ((IEnumerable)enumValues) as IEnumerable<int>;

However, removing the cast to the non-generic IEnumerable causes a compiler error:

var intValues = enumValues as IEnumerable<int>; // error

I've been scratching my head as to why this is, and I think it's got to do with the fact that Array implements the non-generic IEnumerable and that there is all sorts of special casing for arrays in C#, but I'm honestly not sure. Please can someone explain to me what's going on here and why?

like image 630
Ian Kemp Avatar asked Mar 05 '19 12:03

Ian Kemp


People also ask

What is cast<TResult> (IEnumerable) in Java?

An IEnumerable<T> that contains each element of the source sequence cast to the specified type. source is null. An element in the sequence cannot be cast to type TResult. The following code example demonstrates how to use Cast<TResult> (IEnumerable) to enable the use of the standard query operators on an ArrayList.

How to filter the elements of an IEnumerable based on type?

Enumerable.OfType<TResult> method filters the elements of an IEnumerable based on a type specified in TResult. Now let’s consider the following ArrayList which consists of two strings, two integers, two Lists of type string and two objects of type “Employee”.

What happens if an element cannot be cast to type TResult?

If an element cannot be cast to type TResult, this method will throw an InvalidCastException . So if I write the following LINQ query it would be same as the above query.

What is an IEnumerable<T>?

An IEnumerable<T> that contains each element of the source sequence cast to the specified type. source is null. An element in the sequence cannot be cast to type TResult.


3 Answers

I think it's got to do with the fact that Array implements the non-generic IEnumerable and that there is all sorts of special casing for arrays in C#

Yes, you're correct. More precisely, it has to do with array variance. Array variance is a loosening of the type system that happened in .NET1.0 which was problematic but allowed for some tricky cases to be gotten around. Here's an example:

string[] first = {"a", "b", "c"};
object[] second = first;
string[] third = (string[])second;
Console.WriteLine(third[0]); // Prints "a"

This is quite weak because it doesn't stop us doing:

string[] first = {"a", "b", "c"};
object[] second = first;
Uri[] third = (Uri[])second; // InvalidCastException

And there are worse cases again.

It's less useful (if they ever were justified, which some would debate) now we have generics (from .NET2.0 and C#2 onwards) than before when it allowed us to overcome some of the limitations not having generics imposed on us.

The rules allow us do implicit casts to bases of reference types (e.g. string[] to object[]) explicit casts to derived reference types (e.g. object[] to string[]) and explicit casts from Array or IEnumerable to any type of array and also (this is the sticky part) Array and IEnumerable references to arrays of primitive types or enums can be cast to arrays of primitive types of enums of the same size (int, uint and int-based enums are all the same size).

This means that the attempted optimisation of not casting individual values unnecessarily when one can just cast the source directly can have the surprising effects you note.

A practical effect of this that has tripped me up in the past is if you were to try enumValues.Cast<StringComparison>().ToArray() or enumValues.Cast<StringComparison>().ToList(). These would fail with ArrayTypeMismatchException even though enumValues.Cast<StringComparison>().Skip(0).ToArray() would succeed, because as well as Cast<TResult>() using the optimisation noted, ToArray<TSource>() and ToList<TSource>() use optimisations of calling ICollection<T>.CopyTo() internally, and on arrays that fails with the sort of variance involved here.

In .NET Core there was a loosening of the restrictions on CopyTo() with arrays that means this code succeeds rather than throwing, but I forget at which version that change was introduced.

like image 112
Jon Hanna Avatar answered Oct 24 '22 09:10

Jon Hanna


Jon Hanna's answer is pretty much correct, but I can add a few small details.

I'm expecting it to print the type to which the array is being cast Int32[], but instead it prints the original type Foo[]!

What should you have expected? The contract of Cast<int> is that the object that is returned can be used in any context that expects an IEnumerable<int>, and you got that. That's all you should have expected; the rest is implementation details.

Now, I grant you that the fact that a Foo[] can be used as IEnumerable<int> is odd, but remember, a Foo is just an extremely thin wrapper around an int. The size of a Foo is the same as the size of an int, the contents of a Foo are the same as the contents of an int, and so the CLR in its wisdom answers "yes" when asked "is this Foo[] usable as an IEnumerable<int>?"

But what about this?

enumValues as IEnumerable<int> causes a compiler error

This sure sounds like a contradiction, doesn't it?

The problem is that the rules of C# and the rules of the CLR do not match in this situation.

  • The CLR says "a Foo[] can be used as an int[], and a uint[] and ... ".
  • The C# type analyzer is more restrictive. It does not use all of the lax covariance rules of the CLR. The C# type analyzer will allow string[] to be used as object[], and will allow IEnumerable<string> to be used as IEnumerable<object> but it will not allow Foo[] to be used as int[] or IEnumerable<int> and so on. C# only allows covariance when the varying types are both reference types. The CLR allows covariance when the varying types are reference types, or int, uint, or int-sized enums.

The C# compiler "knows" that the conversion from Foo[] to IEnumerable<int> cannot succeed in the C# type system, and so it produces a compiler error; a conversion in C# must be possible to be legal. The fact that this is possible in the more-lenient CLR type system is not considered by the compiler.

By inserting a cast to object or IEnumerable or whatever, you are telling the C# compiler to stop using the rules of C#, and start letting the runtime figure it out. By removing the cast, you're saying that you want the C# compiler to render its judgment, and it does.

So now we have a language design problem; plainly we have an inconsistency here. There are several ways out of this inconsistency.

  • C# could match the rules of the CLR, and allow covariant conversions amongst integer types.
  • C# could generate the as operator so that it implements the rules of C# at runtime; basically, it would have to detect legal-in-the-CLR but illegal-in-C# conversions and disallow them, making all such conversions slower. Moreover, it would then require your scenario to go to the memory-allocating slow path of Cast<T> instead of the reference-preserving fast path.
  • C# could be inconsistent and live with the inconsistency.

The second choice is obviously unfeasible. It only adds costs and has no benefits other than consistency.

It comes down then to the first and third choices, and the C# 1.0 design team chose the third. (Remember, the C# 1.0 design team did not know that they would be adding generics in C# 2.0 or generic variance in C# 4.0.) For the C# 1.0 design team the question was whether enumValues as int[] should be legal or not, and they decided not. Then that design decision was made again for C# 2.0 and C# 4.0.

There are plenty of principled arguments on either side but in practice this situation almost never arises in real world code, and the inconsistency almost never matters, so the lowest-cost choice is to just live with the odd fact that (IEnumerable<int>)(object)enumValues is legal but (IEnumerable<int>)enumValues is not.

For more on this, see my 2009 article on the subject

https://blogs.msdn.microsoft.com/ericlippert/2009/09/24/why-is-covariance-of-value-typed-arrays-inconsistent/

and this related question:

Why does my C# array lose type sign information when cast to object?

like image 14
Eric Lippert Avatar answered Oct 24 '22 10:10

Eric Lippert


Suppose you have a class Car and a derived car: Volvo.

Car car = new Volvo();
var type = car.GetType();

Although you casted theVolvo to a Car, the object is still a Volvo. If you ask for its type, you would get that it is a typeof(Volvo).

Enumerable.Cast<TResult> will cast every element of the input sequence to a TResult, but they are still your original objects. Hence your sequence of Foos are still Foos, even after you cast them to integers.

Addition after comments

Several people commented that this is not the case with IEnumerable.Cast. I thought that would be strange, but let's give it a try:

class Car
{
    public string LicensePlate {get; set;}
    public Color Color {get; set;}
    ...
}
class Volvo : Car
{
    public string OriginalCountry => "Sverige";
}

usage:

var volvos = new List<Volvo>() {new Volvo(), new Volvo(), new Volvo()};
var cars = volvos.Cast<Car>();
foreach (var car in cars) Console.WriteLine(car.GetType());

Result:

CarTest.Volvo
CarTest.Volvo
CarTest.Volvo

Conclusion: Enumerable.Cast does not change the type of the object

like image 1
Harald Coppoolse Avatar answered Oct 24 '22 10:10

Harald Coppoolse