Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using C Preprocessing to get integer value of a string

How would I create a C macro to get the integer value of a string? The specific use-case is following on from a question here. I want to change code like this:

enum insn {
    sysenter = (uint64_t)'r' << 56 | (uint64_t)'e' << 48 |
               (uint64_t)'t' << 40 | (uint64_t)'n' << 32 |
               (uint64_t)'e' << 24 | (uint64_t)'s' << 16 |
               (uint64_t)'y' << 8  | (uint64_t)'s',
    mov = (uint64_t)'v' << 16 | (uint64_t)'o' << 8 |
          (uint64_t)'m'
};

To this:

enum insn {
    sysenter = INSN_TO_ENUM("sysenter"),
    mov      = INSN_TO_ENUM("mov")
};

Where INSN_TO_ENUM expands to the same code. The performance would be the same, but the readability would be boosted by a lot.

I'm suspecting that in this form it might not be possible because of a the C preprocessor's inability for string processing, so this would also be an unpreferred but acceptable solution (variable argument macro):

enum insn {
    sysenter = INSN_TO_ENUM('s','y','s','e','n','t','e','r'),
    mov      = INSN_TO_ENUM('m','o','v')
};
like image 228
Mike Kwan Avatar asked Mar 02 '12 15:03

Mike Kwan


3 Answers

Here's a compile-time, pure C solution, which you indicated as acceptable. You may need to extend it for longer mnemonics. I'll keep on thinking about the desired one (i.e. INSN_TO_ENUM("sysenter")). Interesting question :)

#include <stdio.h>

#define head(h, t...) h
#define tail(h, t...) t

#define A(n, c...) (((long long) (head(c))) << (n)) | B(n + 8, tail(c))
#define B(n, c...) (((long long) (head(c))) << (n)) | C(n + 8, tail(c))
#define C(n, c...) (((long long) (head(c))) << (n)) | D(n + 8, tail(c))
#define D(n, c...) (((long long) (head(c))) << (n)) | E(n + 8, tail(c))
#define E(n, c...) (((long long) (head(c))) << (n)) | F(n + 8, tail(c))
#define F(n, c...) (((long long) (head(c))) << (n)) | G(n + 8, tail(c))
#define G(n, c...) (((long long) (head(c))) << (n)) | H(n + 8, tail(c))
#define H(n, c...) (((long long) (head(c))) << (n)) /* extend here */

#define INSN_TO_ENUM(c...) A(0, c, 0, 0, 0, 0, 0, 0, 0)

enum insn {
    sysenter = INSN_TO_ENUM('s','y','s','e','n','t','e','r'),
    mov      = INSN_TO_ENUM('m','o','v')
};

int main()
{
    printf("sysenter = %llx\nmov = %x\n", sysenter, mov);
    return 0;
}
like image 112
Bartosz Moczulski Avatar answered Oct 01 '22 20:10

Bartosz Moczulski


EDIT: This answer may be helpful so I'm not deleting it, but doesn't specifically answer the question. It DOES convert strings to numbers, but cannot be placed in an enum because it doesn't compute the number at compile-time.

Well, since your integers are 64 bit, you only have the first 8 characters of any string to worry about. Therefore, you can write the thing 8 times, making sure you don't go out of the string bound:

#define GET_NTH_BYTE(x, n)   (sizeof(x) <= n?0:((uint64_t)x[n] << (n*8)))
#define INSN_TO_ENUM(x)      GET_NTH_BYTE(x, 0)\
                            |GET_NTH_BYTE(x, 1)\
                            |GET_NTH_BYTE(x, 2)\
                            |GET_NTH_BYTE(x, 3)\
                            |GET_NTH_BYTE(x, 4)\
                            |GET_NTH_BYTE(x, 5)\
                            |GET_NTH_BYTE(x, 6)\
                            |GET_NTH_BYTE(x, 7)

What it does is basically to check at each byte whether it is in the limit of the string and if it is, then gives you the corresponding byte.

Note: that this only works on literal strings.

If you want to be able to convert any string, you can give the length of the string with it:

#define GET_NTH_BYTE(x, n, l)   (l < n?0:((uint64_t)x[n] << (n*8)))
#define INSN_TO_ENUM(x, l)      GET_NTH_BYTE(x, 0, l)\
                               |GET_NTH_BYTE(x, 1, l)\
                               |GET_NTH_BYTE(x, 2, l)\
                               |GET_NTH_BYTE(x, 3, l)\
                               |GET_NTH_BYTE(x, 4, l)\
                               |GET_NTH_BYTE(x, 5, l)\
                               |GET_NTH_BYTE(x, 6, l)\
                               |GET_NTH_BYTE(x, 7, l)

So for example:

int length = strlen(your_string);
int num = INSN_TO_ENUM(your_string, length);

Finally, there is a way to avoid giving the length, but it requires the compiler actually computing the phrases of INSN_TO_ENUM from left-to-right. I'm not sure if this is standard:

static int _nul_seen;
#define GET_NTH_BYTE(x, n)  ((_nul_seen || x[n] == '\0')?(_nul_seen=1)&0:((uint64_t)x[n] << (n*8)))
#define INSN_TO_ENUM(x)     (_nul_seen=0)|
                              (GET_NTH_BYTE(x, 0)\
                              |GET_NTH_BYTE(x, 1)\
                              |GET_NTH_BYTE(x, 2)\
                              |GET_NTH_BYTE(x, 3)\
                              |GET_NTH_BYTE(x, 4)\
                              |GET_NTH_BYTE(x, 5)\
                              |GET_NTH_BYTE(x, 6)\
                              |GET_NTH_BYTE(x, 7))
like image 22
Shahbaz Avatar answered Oct 01 '22 21:10

Shahbaz


If you can use C++11 on a recent compiler

constexpr uint64_t insn_to_enum(const char* x) {
    return *x ? *x + (insn_to_enum(x+1) << 8) : 0;
}

enum insn { sysenter = insn_to_enum("sysenter") };

will work and calculate the constant during compile time.

like image 24
Gunther Piez Avatar answered Oct 01 '22 20:10

Gunther Piez