Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why reference types inside structs behave like value types?

I am a beginner to C# programming. I am now studying strings, structs, value types and reference types. As accepted answers in here and in here, strings are reference types that have pointers stored on stack while their actual contents stored on heap. Also, as claimed in here, structs are value types. Now I try to practice with structs and strings with a small example:

struct Person
{
    public string name;
}

class Program
{
    static void Main(string[] args)
    {
        Person person_1 = new Person();
        person_1.name = "Person 1";

        Person person_2 = person_1;
        person_2.name = "Person 2";

        Console.WriteLine(person_1.name);
        Console.WriteLine(person_2.name);
    }
}

The above code snippet outputs

Person 1
Person 2

that makes me confused. If strings are reference types and structs are value types then person_1.name and person_2.name should point to the same space region on heap, shouldn't them?

like image 782
duong_dajgja Avatar asked Nov 12 '16 15:11

duong_dajgja


5 Answers

strings are reference types that have pointers stored on stack while their actual contents stored on heap

No no no. First off, stop thinking about stack and heap. This is almost always the wrong way to think in C#. C# manages storage lifetime for you.

Second, though references may be implemented as pointers, references are not logically pointers. References are references. C# has both references and pointers. Don't mix them up. There is no pointer to string in C#, ever. There are references to string.

Third, a reference to a string could be stored on the stack but it could also be stored on the heap. When you have an array of references to string, the array contents are on the heap.

Now let's come to your actual question.

    Person person_1 = new Person();
    person_1.name = "Person 1";
    Person person_2 = person_1; // This is the interesting line
    person_2.name = "Person 2";

Let's illustrate what the code does logically. Your Person struct is nothing more than a string reference, so your program is the same as:

string person_1_name = null; // That's what new does on a struct
person_1_name = "Person 1";
string person_2_name = person_1_name; // Now they refer to the same string
person_2_name = "Person 2"; // And now they refer to different strings

When you say person2 = person1 that does not mean that the variable person1 is now an alias for the variable person2. (There is a way to do that in C#, but this is not it.) It means "copy the contents of person1 to person2". The reference to the string is the value that is copied.

If that's not clear try drawing boxes for variables and arrows for references; when the struct is copied, a copy of the arrow is made, not a copy of the box.

like image 72
Eric Lippert Avatar answered Nov 03 '22 23:11

Eric Lippert


I would highlight the fact, that by person_2.name = "Person 2" we are actually creating a new string object in the memory that contains the value "Person 2", and we are asigning the reference of this object. You can imagine it as following:

class StringClass 
{
   string value; //lets imagine this is a "value type" string, so it's like int

   StringClass(string value)
   { 
      this.value = value
   }
}

By person_2.name = "Person 2" you are actually doing something like person_2.name = new StringClass("Person 2"), whilst "name" holds just a value which represents an address in a memory

Now if I rewrite your code:

struct Person
{
    public StringClass name;
}

class Program
{
    static void Main(string[] args)
    {
        Person person_1 = new Person();
        person_1.name = new String("Person 1"); //imagine the reference value of name is "m1", which points somewhere into the memory where "Person 1" is saved

        Person person_2 = person_1; //person_2.name holds the same reference, that is "m1" that was copied from person_1.name 
        person_2.name = new String("Person 2"); //person_2.name now holds a new reference "m2" to  a new StringClass object in the memory, person_1.name still have the value of "m1"

        person_1.name = person_2.name //this copies back the new reference "m2" to the original struct

        Console.WriteLine(person_1.name);
        Console.WriteLine(person_2.name);
    }
}

Now the output of the snippet:

Person 2
Person 2 

To be able to change person_1.name the way you originally posted in your snippet in a struct you would need to use ref https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/ref

like image 34
Patronaut Avatar answered Sep 21 '22 17:09

Patronaut


The best way to understand this is to fully understand what variables are; variables are, simply put, placeholders that hold values.

So what exactly is this value? In a reference type, the value stored in the variable is the reference (the address so to speak) to a given object. In a value type, the value is the object itself.

When you do AnyType y = x; what really happens is that a copy of the value stored in x is made and is then stored in y.

So if x is a reference type, both x and y will point to the same object because they will both hold identical copies of the same reference. If x is a value type then both x and y will hold two identical but distinct objects.

Once you understand this, it should start to make sense why your code behaves the way it does. Lets study it step by step:

Person person_1 = new Person();

Ok we are creating a new instance of a value type. According to what I explained previously, the value stores in person_1 is the newly created object itself. Where this value is stored (heap or stack) is an implementation detail, its not relevant at all to how your code behaves.

person_1.name = "Person 1";

Now we are setting the variable name which happens to be a field of person_1. Again according to previous explanations, the value of name is a reference pointing to somewhere in memory where the string "Person 1" is stored. Again, where the value or the string are stored is irrelevant.

Person person_2 = person_1;

Ok, this is the interesting part. What happens here? Well, a copy of the value stored in person_1 is made and stored in person_2. Because the value happens to be an instance of a value type, a new copy of said instance is created and stored in person_2. This new copy has its own field name and the value stored in this variable is, again, a copy of the value stored in person_1.name (a reference to "Person 1").

person_2.name = "Person 2";

Now we are simply reassigning the variable person_2.name. This means we are storing a new reference that points to a new string somewhere in memory. Do note, that person_2.name originally held a copy of the value stored in person_1.name so whatever you do to person_2.name has no effect on whatever value is stored in person_1.name because you are simply changing... yeah exactly, a copy. And thats why your code behaves the way it does.

As an exercise, try to reason out in a similar way how your code would behave if Person were a reference type.

like image 20
InBetween Avatar answered Nov 04 '22 01:11

InBetween


Each struct instance has it's own fields. person_1.name is an independent variable from person_2.name. These are not static fields.

person_2 = person_1 copies the struct by value.

The fact that string is immutable is not required to explain this behavior.

Here's the same case with a class instead to demonstrate the difference:

class C { public string S; }

C c1 = new C();
C c2 = c1; //copy reference, share object
c1.S = "x"; //it appears that c2.S has been set simultaneously because it's the same object

Here, c1.S and c2.S refer to the same variable. If you make this a struct then they become different variables (as in your code). c2 = c1 then turns in a copy of the struct value where it previously was a copy of an object reference.

like image 6
usr Avatar answered Nov 04 '22 00:11

usr


Think of strings are arrays of characters. The code below is similar to yours, but with arrays.

public struct Lottery
{
    public int[] numbers;
}

public static void Main()
{
    var A = new Lottery();
    A.numbers = new[] { 1,2,3,4,5 };
    // struct A is in the stack, and it contains one reference to an array in RAM

    var B = A;
    // struct B also is in the stack, and it contains a copy of A.numbers reference
    B.numbers[0] = 10;
    // A.numbers[0] == 10, since both A.numbers and B.numbers point to same memory
    // You can't do this with strings because they are immutable

    B.numbers = new int[] { 6,7,8,9,10 };
    // B.numbers now points to a new location in RAM
    B.numbers[0] = 60;
    // A.numbers[0] == 10, B.numbers[0] == 60        
    // The two structures A and B *are completely separate* now.
}

So if you have a structure that contains references (strings, arrays or classes) and you want to implement ICloneable make sure you also clone the contents of the references.

public class Person : ICloneable
{
    public string Name { get; set; }

    public Person Clone()
    {
        return new Person() { Name=this.Name }; // string copy
    }
    object ICloneable.Clone() { return Clone(); } // interface calls specific function
}
public struct Project : ICloneable
{
    public Person Leader { get; set; }
    public string Name { get; set; }
    public int[] Steps { get; set; }

    public Project Clone()
    {
        return new Project()
        {
            Leader=this.Leader.Clone(),         // calls Clone for copy
            Name=this.Name,                     // string copy
            Steps=this.Steps.Clone() as int[]   // shallow copy of array
        };
    }
    object ICloneable.Clone() { return Clone(); } // interface calls specific function
}
like image 3
John Alexiou Avatar answered Nov 03 '22 23:11

John Alexiou