Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simplest rule of thumb to avoid breaking strict-aliasing rules?

While read another question about aliasing ( What is the strict aliasing rule? ) and its top answer, I realised I still wasn't entirely satisfied even though I think I understood it all there.

(This question is now tagged as C and C++. If your answer refers to just one of these, please clarify which.)

So I want to understand how to do some development in this area, casting pointers in aggressive ways, but with a simple conservative rule that ensures I don't introduce UB. I have a proposal here for such a rule.

(Update: of course, we could just avoid all type punning. But that's not very educational. Unless of course, there are literally zero well-defined exceptions, beyond the union exception.)

Update 2: I understand now why the method proposed in this question is not correct. However, it is still interesting to know whether a simple, safe, alternative exists. As of now, there is at least one answer that proposes such a solution.

This is the original example:

int main()
{
   // Get a 32-bit buffer from the system
   uint32_t* buff = malloc(sizeof(Msg));

   // Alias that buffer through message
   Msg* msg = (Msg*)(buff);

   // Send a bunch of messages    
   for (int i =0; i < 10; ++i)
   {
      msg->a = i;
      msg->b = i+1;
      SendWord(buff[0] );
      SendWord(buff[1] );   
   }
}

The important line is:

Msg* msg = (Msg*)(buff);

which means there are now two pointers (of different types) pointing to the same data. My understanding is that any attempt to write through one of these will render the other pointer essentially invalid. (By 'invalid' I mean that we can ignore it safely, but that reading/writing through an invalid pointer is UB.)

Msg* msg = (Msg*)(buff);
msg->a = 5;           // writing to one of the two pointers
SendWord(buff[0] );   // renders the other, buffer, invalid

Therefore, my proposed rule is that, once you create the second pointer (i.e. create msg), you should immediately and permanently 'retire' the other pointer.

What better way to retire a pointer than to set it to NULL:

Msg* msg = (Msg*)(buff);
buff = NULL; // 'retire' buff. now just one pointer
msg->a = 5;

Now, the last line assigning to msg->a can't invalidate any other pointers because, of course, there are none.

Next, of course, we have to find a way to call SendWord(buff[1] );. This can't be done immediately because buff has been retired and is NULL. My proposal now is to cast back again.

Msg* msg = (Msg*)(buff);
buff = NULL; // 'retire' buff. now just one pointer
msg->a = 5;

buff = (uint32_t*)(msg);   // cast back again
msg = NULL;                // ... and now retire msg

SendWord(buff[1] );

In summary, every time you cast a pointer between two 'incompatible' types (I'm not sure how to define 'incompatible'?) then you should immediately 'retire' the old pointer. Set it to NULL explicitly if that helps you to enforce the rule.

Is this conservative enough?

Perhaps this is too conservative and has other problems, but I first want to know if this is conservative enough to avoid introducing UB via offending strict aliasing.

Finally, recap the original code, modified to use this rule:

int main()
{
   // Get a 32-bit buffer from the system
   uint32_t* buff = malloc(sizeof(Msg));

   // Send a bunch of messages    
   for (int i =0; i < 10; ++i)
   {  // here, buff is 'valid'

      Msg* msg = (Msg*)(buff);
      buff = NULL;
      // here, only msg is 'valid', as buff has been retired
      msg->a = i;
      msg->b = i+1;
      buff = (uint32_t*) msg;  // switch back to buff being 'valid'
      msg = NULL;              // ... by retiring msg
      SendWord(buff[0] );
      SendWord(buff[1] );
      // now, buff is valid again and we can loop around again
   }
}
like image 893
Aaron McDaid Avatar asked Jul 15 '15 11:07

Aaron McDaid


3 Answers

My understanding is that any attempt to write through one of these will render the other pointer essentially invalid.

As long as you don't access the type-punned pointer, the other, "official" one is ok. However, if you do that, it will cause undefined behavior, which may just work, do what you said or something out of this galaxy, including making the other pointer invalid. Compilers can treat UB at their pleasure.

The only way to make buff a valid pointer to Msg is memcpy/memmove, according to the standard:

memcpy( (void*)msg, (const void*) buff, sizeof (*msg));

Also, what triggers UB, is not only writing but also reading or whatever other way that accesses the object:

If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined:

Some compilers also allow "suspending" that rule such as GCC, clang and ICC (probably also MSVC) but that cannot be considered portable or standard behavior. Further techniques, and their code generation analysis, are thoroughly analyzed here.

Do you really need to break the strict-aliasing rule?

Most of the times, no, you do not need that. There are ways and ways to overcome that problem that involve perfectly legal solutions. In the above case, simply store a plain pointer within the struct and send each member in a determined format.

like image 36
edmz Avatar answered Sep 19 '22 12:09

edmz


C++ answer: that won't work. The C++ strict aliasing rule explicitly enumerates which types can be used to access an object. If you use a different type, you get UB, even if you've "retired" all access methods of a different type. As per C++14 (n4140) 3.10/10, the allowed types are:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

  • the dynamic type of the object,
  • a cv-qualified version of the dynamic type of the object,
  • a type similar (as defined in 4.4) to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
  • an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
  • a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
  • a char or unsigned char type.

"Similar types," as per 4.4, pertain to modifying cv-qualification of multi-level pointers.

So, if you've ever written into an area through a pointer (or other accessor) to one type, you cannot access it through a pointer to a different type (unless sanctioned by 3.10/10), even if you forget the old pointer.

If you've never written to an area through a particular type, casting pointers back and forth is not an issue.

like image 170
Angew is no longer proud of SO Avatar answered Sep 19 '22 12:09

Angew is no longer proud of SO


The rule is:

"Unless the pointers are of compatible types. You cannot have two pointers pointing to the same memory."

Here is a simpler example of an endless cycle:

1: int *some_buff = malloc(sizeof(whatever));
2: memset(some_buff,0,sizeof(whatever));
3: while (some_buff[0] == 0)
4: {
5:     whatever *manipulator = (whatever*)some_buff; 
6:     manipulate(manipulator);
7: }

This is essentially how the compiler will/can approach this code:

The test for some_buff[0] == 0 can be optimized out, because there is no valid way how the some_buff[0] could be changed. It is accessed through manipulator, but manipulator isn't of a compatible type, therefore according to the strict aliasing rule, the value of some_buff[0] cannot change.

If you want an even more simpler example:

int *some_buff = malloc(sizeof(whatever));
memset(some_buff,0,sizeof(whatever));
whatever *manipulator = (whatever*)some_buff;
manipulate(manipulator);
printf("%d\n",some_buff[0]);

It is perfectly OK for this code to always print zero and it doesn't matter what manipulate does.

like image 34
Šimon Tóth Avatar answered Sep 20 '22 12:09

Šimon Tóth