Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is boxing a primitive value-type in .NET uncached, unlike Java?

Consider:

int a = 42;

// Reference equality on two boxed ints with the same value
Console.WriteLine( (object)a == (object)a ); // False

// Same thing - listed only for clarity
Console.WriteLine(ReferenceEquals(a, a));  // False

Clearly, each boxing instruction allocates a separate instance of a boxed Int32, which is why reference-equality between them fails. This page appears to indicate that this is specified behaviour:

The box instruction converts the 'raw' (unboxed) value type into an object reference (type O). This is accomplished by creating a new object and copying the data from the value type into the newly allocated object.

But why does this have to be the case? Is there any compelling reason why the CLR does not choose to hold a "cache" of boxed Int32s, or even stronger, common values for all primitive value-types (which are all immutable)? I know Java has something like this.

In the days of no-generics, wouldn't it have helped out a lot with reducing the memory requirements as well as GC workload for a large ArrayListconsisting mainly of small integers? I'm also sure that there exist several modern .NET applications that do use generics, but for whatever reason (reflection, interface assignments etc.), run up large boxing-allocations that could be massively reduced with (what appears to be) a simple optimization.

So what's the reason? Some performance implication I haven't considered (I doubt if testing that the item is in the cache etc. will result in a net performance loss, but what do I know)? Implementation difficulties? Issues with unsafe code? Breaking backwards compatibility (I can't think of any good reason why a well-written program should rely on the existing behaviour)? Or something else?

EDIT: What I was really suggesting was a static cache of "commonly-occurring" primitives, much like what Java does. For an example implementation, see Jon Skeet's answer. I understand that doing this for arbitrary, possibly mutable, value-types or dynamically "memoizing" instances at run-time is a completely different matter.

EDIT: Changed title for clarity.

like image 436
Ani Avatar asked Nov 23 '10 14:11

Ani


4 Answers

One reason which I find compelling is consistency. As you say, Java does cache boxed values in a certain range... which means it's all too easy to write code which works for a while:

// Passes in all my tests. Shame it fails if they're > 127...
if (value1 == value2) {
    // Do something
}

I've been bitten by this - admittedly in a test rather than production code, fortunately, but it's still nasty to have something which changes behaviour significantly outside a given range.

Don't forget that any conditional behaviour also incurs a cost on all boxing operations - so in cases where it wouldn't use the cache, you'd actually find that it was slower (because it would first have to check whether or not to use the cache).

If you really want to write your own caching box operation, of course, you can do so:

public static class Int32Extensions
{
    private static readonly object[] BoxedIntegers = CreateCache();

    private static object[] CreateCache()
    {
        object[] ret = new object[256];
        for (int i = -128; i < 128; i++)
        {
            ret[i + 128] = i;
        }
    }

    public object Box(this int i)
    {
        return (i >= -128 && i < 128) ? BoxedIntegers[i + 128] : (object) i;
    }
}

Then use it like this:

object y = 100.Box();
object z = 100.Box();

if (y == z)
{
    // Cache is working
}
like image 173
Jon Skeet Avatar answered Nov 11 '22 14:11

Jon Skeet


I can't claim to be able to read minds, but here's a couple factors:

1) caching the value types can make for unpredictability - comparing two boxed values that are equal could be true or false depending on cache hits and implementation. Ouch!

2) The lifetime of a boxed value type is most likely short - so how long do you hold the value in cache? Now you either have a lot of cached values that will no longer be used, or you need to make the GC implementation more complicated to track the lifetime of cached value types.

With these downsides, what is the potential win? Smaller memory footprint in an application that does a lot of long-lived boxing of equal value types. Since this win is something that is going to affect a small number of applications and can be worked around by changing code, I'm going to agree with the c# spec writer's decisions here.

like image 28
Philip Rieck Avatar answered Nov 11 '22 12:11

Philip Rieck


Boxed value objects are not necessarily immutable. It is possible to change the value in a boxed value type, such as through an interface.

So if boxing a value type always returned the same instance based on the same original value, it would create references which may not be appropriate (for example, two different value type instances which happen to have the same value end up with the same reference even though they should not).

public interface IBoxed
{
    int X { get; set; }
    int Y { get; set; }
}

public struct BoxMe : IBoxed
{
    public int X { get; set; }

    public int Y { get; set; }
}

public static void Test()
{
    BoxMe original = new BoxMe()
                        {
                            X = 1,
                            Y = 2
                        };
    
    object boxed1 = (object) original;
    object boxed2 = (object) original;

    ((IBoxed) boxed1).X = 3;
    ((IBoxed) boxed1).Y = 4;

    Console.WriteLine("original.X = " + original.X);
    Console.WriteLine("original.Y = " + original.Y);
    Console.WriteLine("boxed1.X = " + ((IBoxed)boxed1).X);
    Console.WriteLine("boxed1.Y = " + ((IBoxed)boxed1).Y);
    Console.WriteLine("boxed2.X = " + ((IBoxed)boxed2).X);
    Console.WriteLine("boxed2.Y = " + ((IBoxed)boxed2).Y);
}

Produces this output:

original.X = 1

original.Y = 2

boxed1.X = 3

boxed1.Y = 4

boxed2.X = 1

boxed2.Y = 2

If boxing didn't create a new instance, then boxed1 and boxed2 would have the same values, which would be inappropriate if they were created from different original value type instance.

like image 3
Samuel Neff Avatar answered Nov 11 '22 12:11

Samuel Neff


There's an easy explanation for this: un/boxing is fast. It needed to be back in the .NET 1.x days. After the JIT compiler generates the machine code for it, there's but a handful of CPU instructions generated for it, all inline without method calls. Not counting corner cases like nullable types and large structs.

The effort of looking up a cached value would greatly diminish the speed of this code.

like image 1
Hans Passant Avatar answered Nov 11 '22 14:11

Hans Passant