Representing dynamic typing in C

Tags:

data-representation

I'm writing a dynamically-typed language. Currently, my objects are represented in this way:

struct Class { struct Class* class; struct Object* (*get)(struct Object*,struct Object*); };
struct Integer { struct Class* class; int value; };
struct Object { struct Class* class; };
struct String { struct Class* class; size_t length; char* characters; };

The goal is that I should be able to pass everything around as a struct Object* and then discover the type of the object by comparing the class attribute. For example, to cast an integer for use I would simply do the following (assume that integer is of type struct Class*):

Click to copy

struct Object* foo = bar();

// increment foo
if(foo->class == integer)
    ((struct Integer*)foo)->value++;
else
    handleTypeError();

The problem is that, as far as I know, the C standard makes no promises about how structures are stored. On my platform this works. But on another platform struct String might store value before class and when I accessed foo->class in the above I would actually be accessing foo->value, which is obviously bad. Portability is a big goal here.

There are alternatives to this approach:

Click to copy

struct Object
{
    struct Class* class;
    union Value
    {
        struct Class c;
        int i;
        struct String s;
    } value;
};

The problem here is that the union uses up as much space as the size of the largest thing that can be stored in the union. Given that some of my types are many times as large as my other types, this would mean that my small types (int) would take up as much space as my large types (map) which is an unacceptable tradeoff.

Click to copy

struct Object
{
    struct Class* class;
    void* value;
};

This creates a level of redirection that will slow things down. Speed is a goal here.

The final alternative is to pass around void*s and manage the internals of the structure myself. For example, to implement the type test mentioned above:

Click to copy

void* foo = bar();

// increment foo
if(*((struct Class*) foo) == integer)
    (*((int*)(foo + sizeof(struct Class*))))++;
else
    handleTypeError();

This gives me everything I want (portability, different sizes for different types, etc.) but has at least two downsides:

Hideous, error-prone C. The code above only calculates a single-member offset; it will get much worse with types more complex than integers. I might be able to alleviate this a bit using macros, but this will be painful no matter what.
Since there is no struct that represents the object, I don't have the option of stack allocations (at least without implementing my own stack on the heap).

Basically, my question is, how can I get what I want without paying for it? Is there a way to be portable, have variance in size for different types, not use redirection, and keep my code pretty?

EDIT: This is the best response I've ever received for an SO question. Choosing an answer was hard. SO only allows me to choose one answer so I chose the one that lead me to my solution, but you all received upvotes.

775

asked Sep 28 '09 05:09

Imagist

2 Answers

See Python PEP 3123 (http://www.python.org/dev/peps/pep-3123/) for how Python solves this problem using standard C. The Python solution can be directly applied to your problem. Essentially you want to do this:

Click to copy

struct Object { struct Class* class; };
struct Integer { struct Object object; int value; };
struct String { struct Object object; size_t length; char* characters; };

You can safely cast Integer* to Object*, and Object* to Integer* if you know that your object is an integer.

answered Oct 22 '22 02:10

Josh Haberman

C gives you sufficient guarantees that your first approach will work. The only modification you need to make is that in order to make the pointer aliasing OK, you must have a union in scope that contains all of the structs that you are casting between:

Click to copy

union allow_aliasing {
    struct Class class;
    struct Object object;
    struct Integer integer;
    struct String string;
};

(You don't need to ever use the union for anything - it just has to be in scope)

I believe the relevant part of the standard is this:

[#5] With one exception, if the value of a member of a union object is used when the most recent store to the object was to a different member, the behavior is implementation-defined. One special guarantee is made in order to simplify the use of unions: If a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the completed type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

(This doesn't directly say it's OK, but I believe that it does guarantee that if two structs have a common intial sequence and are put into a union together, they'll be laid out in memory the same way - it's certainly been idiomatic C for a long time to assume this, anyway).

answered Oct 22 '22 01:10

caf

Related questions
                            
                                Which is faster — sorting or multiplying a small array of elements?
                            
                                C - casting int to char and append char to char
                            
                                "lib" Prefix on Libraries
                            
                                Would it be possible to add type inference to the C language?
                            
                                How to Run a cuda code using remote Desktop?
                            
                                casting 0 to void
                            
                                How do I convert a Win32 exception code to a string?
                            
                                Why cast is needed in printf?
                            
                                Including standard header files. string.h or cstring? or both? [duplicate]
                            
                                Does inclusion of {} matter in C string initialization?
                            
                                How to write on a virtual webcam in Linux?
                            
                                Initializing array of integer pointer in C
                            
                                How to safely convert/copy volatile variable?
                            
                                How hard is it (really) to decompile assembly code? [closed]
                            
                                Order of defining types
                            
                                signed as default in C
                            
                                Difference between ADT and Classes?
                            
                                Is there an optimal byte size for sending data over a network?
                            
                                How do I unroll (compile) an interpreter loop?
                            
                                Good patterns for a C/C++ plugin-based system?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Representing dynamic typing in C

Tags:

c

data-representation

Imagist

People also ask

2 Answers

Josh Haberman

caf

Recent Activity

Donate For Us