Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Non-nullable reference types' default values VS non-nullable value types' default values

This isn't my first question about nullable reference types as it's been few months I'm experiencing with it. But the more I'm experiencing it, the more I'm confused and the less I see the value added by that feature.

Take this code for example

string? nullableString = default(string?);
string nonNullableString = default(string);

int? nullableInt = default(int?);
int nonNullableInt = default(int);

Executing that gives:

nullableString => null

nonNullableString => null

nullableInt => null

nonNullableInt => 0

The default value of an (non-nullable) integer has always been 0 but to me it doesn't make sense a non-nullable string's default value is null. Why this choice? This is opposed to the non-nullable principles we've always been used to. I think the default non-nullable string's default value should have been String.Empty.

I mean somewhere deep down in the implementation of C# it must be specified that 0 is the default value of an int. We also could have chosen 1 or 2 but no, the consensus is 0. So can't we just specify the default value of a string is String.Empty when the Nullable reference type feature is activated? Moreover it seems Microsoft would like to activate it by default with .NET 5 projects in a near future so this feature would become the normal behavior.

Now same example with an object:

Foo? nullableFoo = default(Foo?);
Foo nonNullableFoo = default(Foo);

This gives:

nullableFoo => null

nonNullableFoo => null

Again this doesn't make sense to me, in my opinion the default value of a Foo should be new Foo() (or gives a compile error if no parameterless constructor is available). Why by default setting to null an object that isn't supposed to be null?

Now extending this question even more

string nonNullableString = null;
int nonNullableInt = null;

The compiler gives a warning for the 1st line which could be transformed into an error with a simple configuration in our .csproj file: <WarningsAsErrors>CS8600</WarningsAsErrors>. And it gives a compilation error for the 2nd line as expected.

So the behavior between non-nullable value types and non-nullable reference types isn't the same but this is acceptable since I can override it.

However when doing that:

string nonNullableString = null!;
int nonNullableInt = null!;

The compiler is completely fine with the 1st line, no warning at all. I discovered null! recently when experiencing with the nullable reference type feature and I was expecting the compiler to be fine for the 2nd line too but this isn't the case. Now I'm just really confused as for why Microsoft decided to implement different behaviors.

Considering it doesn't protect at all against having null in a non-nullable reference type variable, it seems this new feature doesn't change anything and doesn't improve developers' lives at all (as opposed to non-nullable value types which could NOT be null and therefore don't need to be null-checked)

So at the end it seems the only value added is just in terms of signatures. Developers can now be explicit whether or not a method's return value could be null or not or if a property could be null or not (for example in a C# representation of a database table where NULL is an allowed value in a column).

Beside that I don't see how can I efficiently use this new feature, could you please give me other useful examples on how you use nullable reference types? I really would like to make good use of this feature to improve my developer's life but I really don't see how...

Thank you

like image 426
Jérôme MEVEL Avatar asked Aug 26 '20 11:08

Jérôme MEVEL


People also ask

What is the difference between nullable and non-nullable?

Nullable variables may either contain a valid value or they may not — in the latter case they are considered to be nil . Non-nullable variables must always contain a value and cannot be nil . In Oxygene (as in C# and Java), the default nullability of a variable is determined by its type.

What is a non-nullable reference type?

A reference isn't supposed to be null.The default state of a nonnullable reference variable is not-null. The compiler enforces rules that ensure it's safe to dereference these variables without first checking that it isn't null: The variable must be initialized to a non-null value.

What is default of nullable type?

The default value of a nullable value type represents null , that is, it's an instance whose Nullable<T>. HasValue property returns false .

Should you use nullable reference types?

The nullable reference types are super useful. In my case, not surprisingly, it has decreased the amount of null reference exceptions in the code. No doubt that it improves the quality of my code. In the future, I am ready to use it as extensively as I do now and I recommend that you try it too if you have not yet.


Video Answer


2 Answers

Again this doesn't make sense to me, in my opinion the default value of a Foo should be new Foo() (or gives a compile error if no parameterless constructor is available)

That's an opinion, but: that isn't how it is implemented. default means null for reference-types, even if it is invalid according to nullability rules. The compiler spots this and warns you about it on the line Foo nonNullableFoo = default(Foo);:

Warning CS8600 Converting null literal or possible null value to non-nullable type.

As for string nonNullableString = null!; and

The compiler is completely fine with the 1st line, no warning at all.

You told it to ignore it; that's what the ! means. If you tell the compiler to not complain about something, it isn't valid to complain that it didn't complain.

So at the end it seems the only value added is just in terms of signatures.

No, it has lots more validity, but if you ignore the warnings that it does raise (CS8600 above), and if you suppress the other things that it does for you (!): yes, it will be less useful. So... don't do that?

like image 184
Marc Gravell Avatar answered Oct 13 '22 01:10

Marc Gravell


You are very confused by how programming language design works.

Default values

The default value of an (non-nullable) integer has always been 0 but to me it doesn't make sense a non-nullable string's default value is null. Why this choice? This is completely against non-nullable principles we've always been used to. I think the default non-nullable string's default value should have been String.Empty.

Default values for variables are a basic feature of the language that is in C# since the very beginning. The specification defines the default values:

For a variable of a value_type, the default value is the same as the value computed by the value_type's default constructor ([see] Default constructors). For a variable of a reference_type, the default value is null.

This makes sense from a practical standpoint, as one of the basic usages of defaults is when declaring a new array of values of a given type. Thanks to this definition, the runtime can just zero all the bits in the allocated array - default constructors for value types are always all-zero values in all fields and null is represented as an all-zero reference. That's literally the next line in the spec:

Initialization to default values is typically done by having the memory manager or garbage collector initialize memory to all-bits-zero before it is allocated for use. For this reason, it is convenient to use all-bits-zero to represent the null reference.

Now Nullable Reference Types feature (NRT) was released last year with C#8. The choice here is not "let's implement default values to be null in spite of NRT" but rather "let's not waste time and resources trying to completely rework how the default keyword works because we're introducing NRTs". NRTs are annotations for programmers, by design they have zero impact on the runtime.

I would argue that not being able to specify default values for reference types is a similar case as for not being able to define a parameterless constructor on a value type - runtime needs a fast all-zero default and null values are a reasonable default for reference types. Not all types will have a reasonable default value - what is a reasonable default for a TcpClient?

If you want your own custom default, implement a static Default method or property and document it so that the developers can use that as a default for that type. No need to change the fundamentals of the language.

I mean somewhere deep down in the implementation of C# it must be specified that 0 is the default value of an int. We also could have chosen 1 or 2 but no, the consensus is 0. So can't we just specify the default value of a string is String.Empty when the Nullable reference type feature is activated?

As I said, the deep down is that zeroing a range of memory is blazingly fast and convenient. There is no runtime component responsible for checking what the default of a given type is and repeating that value in an array when you create a new one, since that would be horribly inefficient.

Your proposal would basically mean that the runtime would have to somehow inspect the nullability metadata of strings at runtime and treat an all-zero non-nullable string value as an empty string. This would be a very involved change digging deep into the runtime just for this one special case of an empty string. It's much more cost-efficient to just use a static analyzer to warn you when you're assigning null instead of a sensible default to a non-nullable string. Fortunately we have such analyzer, namely the NRT feature, which consistently refuses to compile my classes that contain definitions like this:

string Foo { get; set; }

by issuing a warning and forcing me to change that to:

string Foo { get; set; } = "";

(I recommend turning on Treat Warnings As Errors by the way, but it's a matter of taste.)

Again this doesn't make sense to me, in my opinion the default value of a Foo should be new Foo() (or gives a compile error if no parameterless constructor is available). Why by default setting to null an object that isn't supposed to be null?

This would, among other things, render you unable to declare an array of a reference type without a default constructor. Most basic collections use an array as the underlying storage, including List<T>. And it would require you to allocate N default instances of a type whenever you make an array of size N, which is again, horribly inefficient. Also the constructor can have side effects. I'm not going to further ponder how many things this would break, suffice to say it's hardly an easy change to make. Considering how complicated NRT was to implement anyway (the NullableReferenceTypesTests.cs file in the Roslyn repo has ~130,000 lines of code alone), the cost-efficiency of introducing such a change is... not great.

The bang operator (!) and Nullable Value Types

The compiler is completely fine with the 1st line, no warning at all. I discovered null! recently when experiencing with the nullable reference type feature and I was expecting the compiler to be fine for the 2nd line too but this isn't the case. Now I'm just really confused as for why Microsoft decided to implement different behaviors.

The null value is valid only for reference types and nullable value types. Nullable types are, again, defined in the spec:

A nullable type can represent all values of its underlying type plus an additional null value. A nullable type is written T?, where T is the underlying type. This syntax is shorthand for System.Nullable<T>, and the two forms can be used interchangeably. (...) An instance of a nullable type T? has two public read-only properties:

  • A HasValue property of type bool
  • A Value property of type T An instance for which HasValue is true is said to be non-null. A non-null instance contains a known value and Value returns that value.

The reason for which you can't assign a null to int is rather obvious - int is a value type that takes 32-bits and represents an integer. The null value is a special reference value that is machine-word sized and represents a location in memory. Assigning null to int has no sensible semantics. Nullable<T> exists specifically for the purpose of allowing null assignments to value types to represent "no value" scenarios. But note that doing

int? x = null;

is purely syntactic sugar. The all-zero value of Nullable<T> is the "no value" scenario, since it means that HasValue is false. There is no magic null value being assigned anywhere, it's the same as saying = default -- it just creates a new all-zero struct of the given type T and assigns it.

So again, the answer is -- no one deliberately tried to design this to work incompatibly with NRTs. Nullable value types are a much more fundamental feature of the language that works like this since its introduction in C#2. And the way you propose it to work doesn't translate to a sensible implementation - would you want all value types to be nullable? Then all of them would have to have the HasValue field that takes an additional byte and possible screws up padding (I think a language that represents ints as a 40-bit type and not 32 would be considered heretical :) ).

The bang operator is used specifically to tell the compiler "I know that I'm dereferencing a nullable/assigning null to a non-nullable, but I'm smarter than you and I know for a fact this is not going to break anything". It disables static analysis warnings. But it does not magically expand the underlying type to accommodate for a null value.

Summary

Considering it doesn't protect at all against having null in a non-nullable reference type variable, it seems this new feature doesn't change anything and doesn't improve developers' life at all (as opposed to non-nullable value types which could NOT be null and therefore don't need to be null-checked)

So at the end it seems the only value added is just in terms of signatures. Developers can now be explicit whether or not a method's return value could be null or not or if a property could be null or not (for example in a C# representation of a database table where NULL is an allowed value in a column).

From the official docs on NRTs:

This new feature provides significant benefits over the handling of reference variables in earlier versions of C# where the design intent can't be determined from the variable declaration. The compiler didn't provide safety against null reference exceptions for reference types (...) These warnings are emitted at compile time. The compiler doesn't add any null checks or other runtime constructs in a nullable context. At runtime, a nullable reference and a non-nullable reference are equivalent.

So you're right in that "the only value added is just in terms of signatures" and static analysis, which is the reason we have signatures in the first place. And that is not an improvement on developers' lives? Note that your line

string nonNullableString = default(string);

gives off a warning. If you did not ignore it (or even better, had Treat Warnings As Errors on) you'd get value - the compiler found a bug in your code for you.

Does it protect you from assigning null to a non-null reference type at runtime? No. Does it improve developers' lives? Thousand times yes. The power of the feature comes from warnings and nullable analysis done at compile time. If you ignore warnings issued by NRT, you're doing it at your own peril. The fact that you can ignore the compiler's helpful hand does not make it useless. After all you can just as well put your entire code in an unsafe context and program in C, doesn't mean that C# is useless since you can circumvent its safety guarantees.

like image 32
V0ldek Avatar answered Oct 13 '22 00:10

V0ldek