Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dilemma with using value types with `new` operator in C#

When operator new() is used with reference type, space for the instance is allocated on the heap and reference variable itself is placed on the stack. Besides that, everything within the instance of reference type, that is allocated on the heap, is zeroed-out.
For example here is a class:

class Person
{
    public int id;
    public string name;
}

In the following code:

class PersonDemo
{
    static void Main()
    {
        Person p = new Person();
        Console.WriteLine("id: {0}  name: {1}", p.id, p.name);
    }
}

p variable is on the stack and the created instance of Person (all of its memebers) is on the heap. p.id would be 0 and p.name would be null. This would be the case because everything allocated on the heap is zeroed-out.

Now what I'm confused about is if I'm using a value type with new operator. For example, take into consideration following structure:

struct Date
{
    public int year;
    public int month;
    public int day;
}

class DateDemo
{
    static void Main()
    {
        Date someDate;
        someDate= new Date();

        Console.WriteLine("someDate is: {0}/{1}/{2}", 
            someDate.month, someDate.day, someDate.year);
    }
}

Now I would like to know what do the following lines from main do:

        Date someDate;
        someDate= new Date();

In first line someDate variable is allocated on the stack. Precisely 12 bytes.
My question is what happens on the second line? What does operator new() do? Does it only zero-out members of Date structure or it allocates space on the heap as well? On one side I wouldn't expect new to allocate space on the heap, of course because in the first line memory is already allocated on the stack for the structure instance. On the other hand, I would expect new to allocate space on the heap and return address of that space, because that's what new should do. Maybe this is because I'm coming from C++ background.

Nevertheless if the answer is: "when new is used with value types, it only zeroes-out members of object", than it's a bit inconsistent meaning of new operator because:

  1. when using new with value types, it only zeroes-out members of object on the stack
  2. when using new with reference types, it allocates memory on the heap for the instance and zerous-out it's members

Thanks in advance,
Cheers

like image 432
dragan.stepanovic Avatar asked Apr 06 '11 08:04

dragan.stepanovic


People also ask

Why do we use the new operator when instantiating an object?

The new operator instantiates a class by allocating memory for a new object and returning a reference to that memory. The new operator also invokes the object constructor.

What is the difference between a value type and a reference type?

A Value Type holds the data within its own memory allocation and a Reference Type contains a pointer to another memory location that holds the real data.

Does new operator initialize the memory?

The new operator is an operator which denotes a request for memory allocation on the Heap. If sufficient memory is available, new operator initializes the memory and returns the address of the newly allocated and initialized memory to the pointer variable.

Is it always necessary to create objects from class in C++?

C++ Objects. When a class is defined, only the specification for the object is defined; no memory or storage is allocated. To use the data and access functions defined in the class, we need to create objects.


2 Answers

First let me correct your errors.

When operator new() is used with reference type, space for the instance is allocated on the heap and reference variable itself is placed on the stack.

The reference that is the result of "new" is a value, not a variable. The value refers to a storage location.

The reference is of course returned in a CPU register. Whether the contents of that CPU register are ever copied to the call stack is a matter for the jitter's optimizer to decide. It need not ever live on the stack; it could live forever in registers, or it could be copied directly from the register to the managed heap, or, in unsafe code, it could be copied directly to unmanaged memory.

The stack is an implementation detail. You don't know when the stack is being used unless you look at the jitted code.

p variable is on the stack and the created instance of Person (all of its memebers) is on the heap. p.id would be 0 and p.name would be null.

Correct, though of course again p could be realized as a register if the jitter so decides. It need not use the stack if there are available registers.

You seem pretty hung up on this idea that the stack is being used. The jitter might have a large number of registers at its disposal, and those registers can be pretty big.

I'm coming from C++ background.

Ah, that explains why you're so hung up on this stack vs heap thing. Learn to stop worrying about it. We've designed a managed memory environment where things live as long as they need to. Whether the manager chooses to use stack, heap or registers to efficiently manage the memory is up to it.

In first line someDate variable is allocated on the stack. Precisely 12 bytes.

Let's suppose for the sake of argument that this 12 byte structure is allocated on the stack. Seems reasonable.

My question is what happens on the second line? What does operator new() do? Does it only zero-out members of Date structure or it allocates space on the heap as well?

The question presupposes a false dichotomy and is therefore impossible to answer as stated. The question presents two either-or alternatives, neither of which is necessarily correct.

On one side I wouldn't expect new to allocate space on the heap, of course because in the first line memory is already allocated on the stack for the structure instance.

Correct conclusion, specious reasoning. No heap allocation is performed because the compiler knows that no part of this operation requires a long-lived storage. That's what the heap is for; when the compiler determines that a given variable might live longer than the current method activation, it generates code which allocates the storage for that variable on the long-lived "heap" storage. If it determines that the variable definitely has a short lifetime then it uses the stack (or registers), as an optimization.

On the other hand, I would expect new to allocate space on the heap and return address of that space, because that's what new should do.

Incorrect. "new" does not guarantee that heap is allocated. Rather, "new" guarantees that a constructor is called on zeroed-out memory.

Let's go back to your question:

Does it only zero-out members of Date structure or it allocates space on the heap as well?

We know it does not allocate space on the heap. Does it zero out members of the date structure?

That's a complicated question. The specification says that what happens when you say

someDate = new Date();    
  • the address of someDate is determined
  • space is allocated (off "the stack") for the new object. It is zeroed out.
  • then the constructor, if any, is called, with "this" being a reference to the new stack storage
  • then the bytes of the new stack storage are copied to the address of someDate.

Now, is that actually what happens? You would be perfectly within your rights to notice that it is impossible to tell whether new stack space is allocated, initialized and copied, or whether the "old" stack space is initialized.

The answer is that in cases where the compiler deduces that it is impossible for the user to notice that the existing stack space is being mutated, the existing stack space is mutated and the extra allocation and subsequent copy are elided.

In cases where the compiler is unable to deduce that, then a temporary stack slot is created, initialized to zeros, constructed, mutated by the constructor, and then the resulting value is copied to the variable. This ensures that if the constructor throws an exception, you cannot observe an inconsistent state in the variable.

For more details about this issue and its analysis by the compiler see my article on the subject.

https://ericlippert.com/2010/10/11/debunking-another-myth-about-value-types/

like image 151
Eric Lippert Avatar answered Nov 15 '22 15:11

Eric Lippert


OK here is a simple one:

class Program
{
    static void Main(string[] args)
    {
        DateTime dateTime = new DateTime();
        dateTime = new DateTime();
        Console.Read();
    }
}

which compiles to this IL code:

.method private hidebysig static void  Main(string[] args) cil managed
{
  .entrypoint
  // Code size       24 (0x18)
  .maxstack  1
  .locals init ([0] valuetype [mscorlib]System.DateTime dateTime)
  IL_0000:  nop
  IL_0001:  ldloca.s   dateTime
  IL_0003:  initobj    [mscorlib]System.DateTime
  IL_0009:  ldloca.s   dateTime
  IL_000b:  initobj    [mscorlib]System.DateTime
  IL_0011:  call       int32 [mscorlib]System.Console::Read()
  IL_0016:  pop
  IL_0017:  ret
} // end of method Program::Main

As you can see CLR will be using the same local variable to store the new value type although it will run the constructor again - which will most likely just zero the memory. We cannot see what initobj is, this is a CLR implementation.

Reality is, as Eric Lippert explains here, there is no such general rule about value types being allocated on the stack. This is purely down to implementation of the CLR.

like image 26
Aliostad Avatar answered Nov 15 '22 14:11

Aliostad