Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the race condition in these two interrupt service routines?

I'm using an ARM microcontroller for a real-time systems course in my university. In the project I'm working on at the moment, I'm implementing the vector field histogram (VFH) algorithm.

The problem is: I need to communicate between threads; more specifically, I want to have a thread that gets sensor data from rangefinders, make the necessary transformations to it and deposit them in a queue. Them, another thread must get this data and process it, etc.

At the moment, I'm using a simpler version of it - one thread gets data from the ADCs (SensorAcquisitionHandler), another outputs the average of the first 5 items (at most) to a display (ControlSignalHandler).

/* Queue used to store data from the rangefinder sensors. */
static unsigned int Sensors[100];
static int SensorsHead = 0;
static int SensorsTail = 0;

void SensorAcquisitionHandler(void) {
    /* Clear the interrupt. */
    ADCIntClear(ADC0_BASE, 1);

    int i; /* Index for the measurements buffer. */

    /* There are only 3 rangefinders used. */
    if (ADCSequenceDataGet(ADC0_BASE, 1, rangeBuffer) == 3) {
        /* Put rangeBuffer's data into SensorDataQueue. */
        /* Also, when using SensorDataQueue, must put what's the direction of 
        the corresponding range measurement. */

        /* Critical section ahead!!! Turn off interrupts!!! */
        IntMasterDisable();

        /* Temporarily using the simple FIFO... */
        for (i = 0; i < 3; ++i) {
            if (SensorsHead < 100) {
                Sensors[SensorsHead] = rangeBuffer[i];
                SensorsHead++;
            }
        }

        /* All is fine, turn on interrupts. */
        IntMasterEnable();
    }
}

void ControlSignalHandler(void) {
    /* Clear the timer interrupt. */
    TimerIntClear(TIMER0_BASE, TIMER_TIMA_TIMEOUT);

    unsigned char i; /* Index for the measurements buffer. */
    unsigned long average = 0;
    char buffer[20];

    /* Average first n (n <= 5) elements from Sensors queue. */
    for (i = 0; i < 5 && SensorsTail < SensorsHead; ++i) {
        average += Sensors[SensorsTail];
        SensorsTail++;
    }

    IntMasterDisable();
    average /= i;

    sprintf(buffer, "%d  ", average);

    average = 0;

    if (SensorsTail >= SensorsHead) {
        SensorsTail = 0;
        SensorsHead = 0;
    }

    Display96x16x1StringDraw(buffer, 0, 0);
    IntMasterEnable();
}

The result is relatively stable for some time but, at random intervals, gets extremely high (the result is ~330 almost all the time). Also, when I use a symbolic debugger during the "very high value" moments, the indexes SensorTail and SensorHead can get to 300+ (the queue is a 100-element array).

This sounds like some kind of overflow, but I can't visualize how it's happening. Can someone help me find it?

I know the answer to the problem is "use a thread-safe queue", but I want to understand how a race condition is happening here, how the indexes are getting messed up, etc. Thank you!

like image 240
agarie Avatar asked Feb 18 '23 17:02

agarie


2 Answers

You can avoid the race conditions on the head and tail pointers by using a lock-less single-reader single writer FIFO - one in which the head pointer is only ever written in one thread (or in your case ISR) and the tail written in the other. This means you perform the test for buffer wrapping in each ISR.

If you did this and reset your interrupt source right at the end of the each ISR, you ought not to need any locking at all - globally disabling interrupts as you are doing is very bad manners. Currently you're holding the locks are long time.

Another reason why you need to rewrite your FIFO implementation is:

    for (i = 0; i < 3; ++i) {
        if (SensorsHead < 100) {

Since you're adding 3 readings at a time, you'll eventually enter SensorAcquisitionHandler() with SensorsHead==99 - which guarantees you'll throw 2 readings away.

Similarly:

/* Average first n (n <= 5) elements from Sensors queue. */
for (i = 0; i < 5 && SensorsTail < SensorsHead; ++i) {
    average += Sensors[SensorsTail];
    SensorsTail++;
}

will in some circumstances perform the calculation on rather less than 5 values.

Depending on the ARM part you're using, there isn't hardware divide. Calculating your average over a power-of-two values is much cheaper as it's a single-cycle logical shift.

Finally, I imagine that Display96x16x1StringDraw(buffer, 0, 0); is a particularly expensive operation, and it may bot be thread-safe either. IO is always strictly verboten in ISRs. You probably want another queue between your timer thread and a non-interrupt context - which handles output.

like image 89
marko Avatar answered Mar 05 '23 15:03

marko


By clearing an interrupt, you are allowing it to happen again. Imagine what happens if (e.g.) the ControlSignalHandler gets re-entered in the middle of its first loop because the timer managed to outrun your code...

Wrap the entire function (both of them) in IntMasterDisable + IntMasterEnable, and clear the interrupt after the disable and before the enable. (I would do it immediately before the enable.)

like image 37
Nemo Avatar answered Mar 05 '23 15:03

Nemo