Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In C#/.NEt does a dynamic type take less space than object?

I have a console application that allows the users to specify variables to process. These variables come in three flavors: string, double and long (with double and long being by far the most commonly used types). The user can specify whatever variables they like and in whatever order so my system has to be able to handle that. To this end in my application I had been storing these as object and then casting/uncasting them as required. for example:

public class UnitResponse
{
    public object Value { get; set; }
}

My understanding was that boxed objects take up a bit more memory (about 12 bytes) than a standard value type.

My question is: would it be more efficient to use the dynamic keyword to store these values? It might get around the boxing/unboxing issue, and if it is more efficient how would this impact performance?

EDIT

To provide some context and prevent the "are you sure you're using enough RAM to worry about this" in my worst case I have 420,000,000 datapoints to worry about (60 variables * 7,000,000 records). This is in addition to a bunch of other data I keep about each variable (including a few booleans, etc.). So reducing memory does have a HUGE impact.

like image 964
Jeffrey Cameron Avatar asked Jan 27 '11 23:01

Jeffrey Cameron


People also ask

What does << mean in C?

<< is the left shift operator. It is shifting the number 1 to the left 0 bits, which is equivalent to the number 1 .

What does %d do in C?

%d is a format specifier, used in C Language. Now a format specifier is indicated by a % (percentage symbol) before the letter describing it. In simple words, a format specifier tells us the type of data to store and print. Now, %d represents the signed decimal integer.


1 Answers

OK, so the real question here is "I've got a freakin' enormous data set that I am storing in memory, how do I optimize its performance in both time and memory space?"

Several thoughts:

  • You are absolutely right to hate and fear boxing. Boxing has big costs. First, yes, boxed objects take up extra memory. Second, boxed objects get stored on the heap, not on the stack or in registers. Third, they are garbage collected; every single one of those objects has to be interrogated at GC time to see if it contains a reference to another object, which it never will, and that's a lot of time on the GC thread. You almost certainly need to do something to avoid boxing.

Dynamic ain't it; it's boxing plus a whole lot of other overhead. (C#'s dynamic is very fast compared to other dynamic dispatch systems, but it is not fast or small in absolute terms).

It's gross, but you could consider using a struct whose layout shares memory between the various fields - like a union in C. Doing so is really really gross and not at all safe but it can help in situations like these. Do a web search for "StructLayoutAttribute"; you'll find tutorials.

  • Long, double or string, really? Can't be int, float or string? Is the data really either in excess of several billion in magnitude or accurate to 15 decimal places? Wouldn't int and float do the job for 99% of the cases? They're half the size.

Normally I don't recommend using float over double because its a false economy; people often economise this way when they have ONE number, like the savings of four bytes is going to make the difference. The difference between 42 million floats and 42 million doubles is considerable.

  • Is there regularity in the data that you can exploit? For example, suppose that of your 42 million records, there are only 100000 actual values for, say, each long, 100000 values for each double, and 100000 values for each string. In that case, you make an indexed storage of some sort for the longs, doubles and strings, and then each record gets an integer where the low bits are the index, and the top two bits indicate which storage to get it out of. Now you have 42 million records each containing an int, and the values are stored away in some nicely compact form somewhere else.

  • Store the booleans as bits in a byte; write properties to do the bit shifting to get 'em out. Save yourself several bytes that way.

  • Remember that memory is actually disk space; RAM is just a convenient cache on top of it. If the data set is going to be too large to keep in RAM then something is going to page it back out to disk and read it back in later; that could be you or it could be the operating system. It is possible that you know more about your data locality than the operating system does. You could write your data to disk in some conveniently pageable form (like a b-tree) and be more efficient about keeping stuff on disk and only bringing it in to memory when you need it.

like image 123
Eric Lippert Avatar answered Sep 24 '22 14:09

Eric Lippert