Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can variables adjacent to a bit-field get corrupted?

I am facing a problem very similar to that described by the Linux Kernel community - Betrayed by a Bit-Field

The essence of the problem is that GCC issues 64-bit read accesses to even access 1-bit bitfields. This causes the unexpected side effect of reading in adjacent fields that can be modified elsewhere in the program. When the modified bitfield value is written back, the adjacent variable's old value is also written back, thereby losing any modification done to it by other threads.

My problem is slightly different. I have a class/struct like this-

class Group {

    uint8 adjVariable;
    volatile bool  flag1: 1;
    volatile bool  flag2: 1;
    // so on...
    volatile bool  flag10: 1;
};

The way these variables are accessed is -

Group::fun() {
    Group_Scoped_lock();
    // adjVariable was 12 here.
    if ( adjVariable > 0 ) {
        adjVariable = 0; // <------- EXPLICIT ZERO ASSIGNMENT
    }
    // some code that doesn't affect adjVariable 
    bool1 = false;
    bool2 = false;
    bool3 = false;
    assert( adjVariable == 0 ); // <---- This assert is tripping stating that adjVariable is 12!!
}

Before we suspected "bugs" with GCC, I verified if adjVariable was being accessed without Group_lock() elsewhere. To the best of my ability, I couldnt see any place in code where this was happening.

Now, since compiler issues 64 bit reads for bitfields AND they are volatile, what if it issued a read to adjVariable as part of this read AND the explicit ZERO assignment of adjVariable was still in cache and hence we read the older value of 12 for adjVariable? And this newly read value overwrites the explicitly set value? And hence we are tripping the assert? If so, how do I validate this?

In the article, they are discussing losing out updates to adjacent variables done in other threads but in my problem, I'm suspecting we are losing on updates to adjVariable done in the same thread because of reads from memory. Is this possible?

We are using an ancient g++ compiler which is only C++98 compliant on an equally older Fedora release 12 virtual machine. Also, we ran into this issue ONLY once a code base which was running for 6 months

like image 557
Pavan Manjunath Avatar asked Nov 08 '22 19:11

Pavan Manjunath


1 Answers

If adjVariable is not accessed from any other concurrent thread, then it's pretty much guaranteed to be 0 at the point of the assert.

While all the bools are a single memory location and indeed can produce some weird behavior between themselves, adjVariable is a separate memory location and the compiler has to make sure that all the loads and stores to it appear to happen in well-defined order conforming to the source code.

If the compiler issues 64-bit writes for bit fields, then it has to protect adjacent memory locations by aligning bit fields to 8 bytes (e.g. there should be 7-byte padding between adjVariable and flag1). I don't see how 64-bit reads can mess with correctness here though.

While the notion of memory location only works for C++11 and later, the logic still applies to C++98: the only way for adjVariable not to be zero in the assert should be for another thread to write to adjVariable.

like image 184
Ap31 Avatar answered Nov 15 '22 06:11

Ap31