A few weeks ago, I discovered that strings in C# are defined as reference types and not value types. Initially I was confused about this, but then after some reading, I suddenly understood why it is important to store strings on the heap and not the stack - because it would be very inefficient to have a very large string that gets copied over an unpredictable number of stack frames. I completely accept this.
I feel that my understanding is almost complete, but there is one element that I am missing - what language feature do strings use to keep them immutable? To illustrate with a code example:
string valueA = "FirstValue";
string valueB = valueA;
valueA = "AnotherValue";
Assert.AreEqual("FirstValue", valueB); // Passes
I do not understand what language feature makes a copy of valueA when I assign it to valueB. Or perhaps, the reference to valueA does not change when I assign it to valueB, only valueA gets a new reference to itself when I set the string. As this is an instance type, I do not understand why this works.
I understand that you can overload, for example, the == and != operators, but I cannot seem to find any documentation on overloading the = operators. What is the explanation?
C has very little syntactical support for strings. There are no string operators (only char-array and char-pointer operators). You can't assign strings.
String assignment is performed using the = operator and copies the actual bytes of the string from the source operand up to and including the null byte to the variable on the left-hand side, which must be of type string. You can create a new variable of type string by assigning it an expression of type string.
In C programming, a string is a sequence of characters terminated with a null character \0 . For example: char c[] = "c string"; When the compiler encounters a sequence of characters enclosed in the double quotation marks, it appends a null character \0 at the end by default.
what language feature do strings use to keep them immutable?
It is not a language feature. It is the way the class is defined.
For example,
class Integer {
private readonly int value;
public int Value { get { return this.value; } }
public Integer(int value) { this.value = value; } }
public Integer Add(Integer other) {
return new Integer(this.value + other.value);
}
}
is like an int
except it's a reference type, but it's immutable. We defined it to be so. We can define it to be mutable too:
class MutableInteger {
private int value;
public int Value { get { return this.value; } }
public MutableInteger(int value) { this.value = value; } }
public MutableInteger Add(MutableInteger other) {
this.value = this.value + other.value;
return this;
}
}
See?
I do not understand what language feature makes a copy of
valueA
when I assign it tovalueB
.
It doesn't copy the string
, it copies the reference. string
s are reference type. This means that variables of type string
s are storage locations whose values are references. In this case, their values are references to instances of string
. When you assign a variable of type string
to another of type string
, the value is copied. In this case, the value is a reference and it is copied by the assignment. This is true for any reference type, not just string
or only immutable reference types.
Or perhaps, the reference to
valueA
does not change when I assign it tovalueB
, onlyvalueA
gets a new reference to itself when i set the string.
Nope, the values of valueA
and valueB
refer to the same instance of string
. Their values are references, and those values are equal. If you could somehow mutate* the instance of string
referred to by valueA
, the referrent of both valueA
and valueB
would see this mutation.
As this is an instance type, I do not understand why this works.
There is no such thing as an instance type.
Basically, string
s are reference types. But string
are immutable. When you mutate a string
, what happens is that you get a reference to a new string that is the result of the mutation to the already existing string
.
string s = "hello, world!";
string t = s;
string u = s.ToUpper();
Here, s
and t
are variables whose values refer to the same instance of string
. The referrent of s
is not mutated by the call to String.ToUpper
. Instead, s.ToUpper
makes a mutation of the referrent of s
and returns a reference to a new instance of string
that it creates in the process of apply the mutation. We assign that reference to u
.
I understand that you can overload, for example, the == and != operators, but I cannot seem to find any documentation on overloading the = operators.
You can't overload =
.
* You can, with some tricks. Ignore them.
First of all, your example will work the same to any reference variables, not just strings.
What happens is:
string valueA = "FirstValue"; //ValueA is referenced to "FirstValue"
string valueB = valueA; //valueB references to what valueA is referenced to which is "FirstValue"
valueA = "AnotherValue"; //valueA now references a new value: "AnotherValue"
Assert.AreEqual("FirstValue", valueB); // remember that valueB references "FirstValue"
Now the immutability is a different concept. It means that the value itself can't be changed.
This will show up in a situation like this:
string valueA = "FirstValue"; //ValueA is referenced to "FirstValue"
string valueB = valueA; //valueB references to what valueA is referenced to which is "FirstValue"
valueA.Replace('F','B'); //valueA will now be: "BirstValue"
Assert.AreEqual("FirstValue", valueB); // remember that valueB references "FirstValue"
This is because of String's immutability, valueA doesn't change the string itself... It creates a new COPY with the changes and references that.
Or perhaps, the reference to valueA does not change when I assign it to valueB, only valueA gets a new reference to itself when i set the string.
That is correct. As strings are immutable, there is no problem having two variables referencing the same string object. When you assign a new string to one of them, it's the reference that is replaced, not the string object.
I cannot seem to find any documentation on overloading the = operators.
That is not due to any shortcoming on your side, it's because there is no way to overload the assignment operator in C#.
The =
operator is quite simple, it takes the value on the right hand side and assigns to the variable on the left hand side. If it's a reference type, the value is the reference, so that is what's assigned.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With