Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this (char*)&x cast's behaviour well-defined?

While writing some C code, I came across a little problem where I had to convert a character into a "string" (some memory chunk the beginning of which is given by a char* pointer).

The idea is that if some sourcestr pointer is set (not NULL), then I should use it as my "final string", otherwise I should convert a given charcode into the first character of another array, and use it instead.

For the purposes of this question, we'll assume that the types of the variables cannot be changed beforehand. In other words, I can't just store my charcode as a const char* instead of an int.

Because I tend to be lazy, I thought to myself : "hey, couldn't I just use the character's address and treat that pointer as a string?". Here's a little snippet of what I wrote (don't smash my head against the wall just yet!) :

int charcode    = FOO;   /* Assume this is always valid ASCII. */

char* sourcestr = "BAR"; /* Case #1 */
char* sourcestr = NULL;  /* Case #2 */

char* finalstr  = sourcestr ? sourcestr : (char*)&charcode;

Now of course I tried it, and as I expected, it does work. Even with a few warning flags, the compiler is still happy. However, I have this weird feeling that this is actually undefined behaviour, and that I just shouldn't be doing it.

The reason why I think this way is because char* arrays need to be null-terminated in order to be printed properly as strings (and I want mine to be!). Yet, I have no certainty that the value at &charcode + 1 will be zero, hence I might end up with some buffer overflow madness.

Is there an actual reason why it does work properly, or have I just been lucky to get zeroes in the right places when I tried?

(Note that I'm not looking for other ways to achieve the conversion. I could simply use a char tmp[2] = {0} variable, and put my character at index 0. I could also use something like sprintf or snprintf, provided I'm careful enough with buffer overflows. There's a myriad of ways to do this, I'm just interested in the behaviour of this particular cast operation.)

Edit: I've seen a few people call this hackery, and let's be clear: I completely agree with you. I'm not enough of a masochist to actual do this in released code. This is just me getting curious ;)

like image 552
John WH Smith Avatar asked Feb 23 '16 12:02

John WH Smith


3 Answers

Your code is well-defined as you can always cast to char*. But some issues:

  1. Note that "BAR" is a const char* literal - so don't attempt to modify the contents. That would be undefined.

  2. Don't attempt to use (char*)&charcode as a parameter to any of the string functions in the C standard library. It will not be null-terminated. So in that sense, you cannot treat it as a string.

  3. Pointer arithmetic on (char*)&charcode will be valid up to and including one past the scalar charcode. But don't attempt to dereference any pointer beyond charcode itself. The range of n for which the expression (char*)&charcode + n is valid depends on sizeof(int).

like image 72
Bathsheba Avatar answered Oct 01 '22 22:10

Bathsheba


The cast and assignment, char* finalstr = (char*)&charcode; is defined.

Printing finalstr with printf as a string, %s, if it points to charcode is undefined behavior.

Rather than resorting to hackery and hiding string in a type int, convert the values stored in the integer to a string using a chosen conversion function. One possible example is:

char str[32] = { 0 };
snprintf( str , 32 , "%d" , charcode );
char* finalstr = sourcestr ? sourcestr : str;

or use whatever other (defined!) conversion you like.

like image 31
2501 Avatar answered Oct 01 '22 21:10

2501


Like other said it happens to work because the internal representation of an int on your machine is little endian and your char is smaller than an int. Also the ascii value of your character is either below 128 or you have unsigned chars (otherwise there would be sign extension). This means that the value of the character is in the lower byte(s) of the representation of the int and the rest of the int will be all zeroes (assuming any normal representation of an int). You're not "lucky", you have a pretty normal machine.

It is also completely undefined behavior to give that char pointer to any function that expects a string. You might get away with it now but the compiler is free to optimize that to something completely different.

For example if you do a printf just after that assignment, the compiler is free to assume that you'll always pass a valid string to printf which means that the check for sourcestr being NULL is unnecessary because if sourcestr was NULL printf would be called with something that isn't a string and the compiler is free to assume that undefined behavior never happens. Which means that any check of sourcestr being NULL before or after that assignment are unnecessary because the compiler already knows it isn't NULL. This assumption is allowed to spread to everywhere in your code.

This was rarely a thing to worry about and you could get away with tricks uglier than this until a decade ago or so when compiler writers started an arms race about how much they can follow the C standard to the letter to get away with more and more brutal optimizations. Today compilers are getting more and more aggressive and while the optimization I speculated about probably doesn't exist yet, if a compiler person sees this, they'll probably implement it just because they can.

like image 29
Art Avatar answered Oct 01 '22 21:10

Art