Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to specify enum size in GCC?

Tags:

c

enums

gcc

I want to specify an enumeration size of 64 bits for an enumeration. How is this possible through GCC? The code does not need to be 'portable' in that I am only interested in making the code work on GCC compiling for x86-32 and x86-64 Linux. That means any hack which can provide the functionality I want is fine as long as it works for those targets.

Given this code:

#include <stdlib.h>
#include <stdio.h>

enum some_enum
{
    garbage1,
    garbage2
};

int main(void)
{
    enum some_enum some_val;
    printf("size: %lu\n", sizeof(some_val));

    return EXIT_SUCCESS;
}

This currently prints out 4, whereas I want to be able to force the size to be 8. Attempting to specify values in the enum assignment larger than 4 bytes cause a warning. For example,

enum some_enum
{
    garbage1 = '12345',
    garbage2
};

Would produce:

warning: character constant too long for its type [enabled by default]

An answer to a similar question here doesn't seem to yield any good results. That is, the same warning is produced as a result of:

enum some_enum
{
    garbage1 = 'adfs',
    garbage2 = 'asdfasdf'
};

Note: the multi-character warning can be turned off by compiling with -Wno-multichar.


Rationale

Since people are interested in why I am doing this, I have written a disassembler engine. I get each part of an instruction as a string. So I want the enumeration to look like this:

enum mnemonic
{
    mov = 'mov',
    cmp = 'cmp',
    sysenter = 'sysenter'
};

I can then store semantic information easily with some code like this:

enum mnemonic insn;

char *   example_insn = "mov";
uint64_t buf          = 0;

strncpy((char *)&buf, example_insn, sizeof(uint64_t));

If buf were an enum mnemonic then we need to do nothing else. The strncpy is used to pad the bytes after the end of the string to null characters. If I am not able to do this, I would have to do something like this instead:

if(strcmp(example_insn, "mov") == 0) {
    insn = mov;
} else if(strcmp(example_insn, "cmp") == 0) {
    insn = cmp;
} ...

Since this routine is going to be hit millions of times, this sort of optimisation would make a huge difference. I intend to do the same for operands such as registers too.

like image 556
Mike Kwan Avatar asked Mar 01 '12 21:03

Mike Kwan


People also ask

How big is an enum in C?

But how big is an enum? The answer is the standard computer answer: it depends. The C standard specifies that enums are integers, but it does not specify the size. Once again, that is up to the people who write the compiler. On an 8-bit processor, enums can be 16-bits wide.

Is there a way to change the size of an enum?

There is no official way to specify enum size yet. Maybe not 100% applied to your case, but with this hack you could achieve some target enum size to be cross-platform. You should use keyword __attribute__ with parameter packed to shrink enum size up to maximum value in the enum list.

What is the difference between ENUM and command_T in C++?

On one hand type command_t has size 8 and can be used for variable and function parameter type. On the other hand you can use the enum values for assignation that are of type int by default but the compiler will cast them immediately when assigned to a command_t type variable.

Is it worth it to use enum types in aggregates?

If it is about saving memory when building aggregates from enum types, then it might be worth doing. However, in C you can simply use a suitably-sized integer type instead of enum type in aggregates. In C (as opposed to C++) enum types and integer types are almost always interchangeable.


1 Answers

As Matteo Italia's answer says, gcc lets you define a 64-bit enumeration type by specifying a 64-bit value for one of the members. For example:

enum some_enum {
    /* ... */
    max = 0x7fffffffffffffff
};

As for your use of 'mov', 'cmp', and so forth, there is no necessary correlation between the representation of a string literal like "mov" and the representation of a multi-character character constant like 'mov'.

The latter is legal (and supported by gcc), but the value is implementation-defined. The standard says that the type is always int, and gcc doesn't seem to have an extension that lets you override that. So if int is 4 bytes, then 'sysenter', if it's accepted at all, won't necessarily have the value you're looking for. gcc seems to ignore all but the low-order bytes of such a constant. The value of the constant seems to be consistent across big-endian and little-endian systems -- which means that it won't consistently match the representation of a similar string literal.

For example, this program:

#include <stdio.h>
int main(void) {
    const char *s1 = "abcd";
    const char *s2 = "abcdefgh";
    printf("'abcd'     = 0x%x\n", (unsigned)'abcd');
    printf("'abcdefgh' = 0x%x\n", (unsigned)'abcdefgh');
    printf("*(unsigned*)s1 = 0x%x\n", *(unsigned*)s1);
    printf("*(unsigned*)s2 = 0x%x\n", *(unsigned*)s2);
    return 0;
}

produces this output when compiled with gcc on a little-endian system (x86):

'abcd'     = 0x61626364
'abcdefgh' = 0x65666768
*(unsigned*)s1 = 0x64636261
*(unsigned*)s2 = 0x64636261

and this output on a big-endian system (SPARC):

'abcd'     = 0x61626364
'abcdefgh' = 0x65666768
*(unsigned*)s1 = 0x61626364
*(unsigned*)s2 = 0x61626364

So I'm afraid your idea of matching character constants like 'mov' against strings like "mov" isn't going to work. (Conceivably you could normalize the string representations to big-endian, but I wouldn't take that approach myself.)

The problem you're trying to solve is quickly mapping strings like "mov" to specific integer values that represent CPU instructions. You're right that a long sequence of strcmp() calls is going to be inefficient (have you actually measured it and found that the speed is unacceptable?) -- but there are better ways. A hash table of some sort is probably the best. There are tools to generate perfect hash functions, so that a relatively cheap computation on the value of the string gives you a unique integer value.

You won't be able to write the definitions of your enumeration values quite as conveniently, but once you have the right hash function you can write a program to generate the C source code for the enum type.

That's assuming that an enum is the best approach here; it might not be. If I were doing this, the central data structure would be a collection of structs, where each one contains the string name of the operator and whatever other information is associated with it. The hash function would map strings like "mov" to indices in this collection. (I'm being deliberately vague about what kind of "collection" to use; with the right hash function, it might be a simple array.) With this kind of solution, I don't think the 64-bit enum type is needed.

like image 95
Keith Thompson Avatar answered Sep 20 '22 12:09

Keith Thompson