Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do I need a semaphore when reading from a global structure?

A fairly basic question, but I don't see it asked anywhere.

Let's say we have a global struct (in C) like so:

struct foo {
  int written_frequently1;
  int read_only;
  int written_frequently2;
};

It seems clear to me that if we have lots of threads reading and writing, we need a semaphore (or other lock) on the written_frequently members, even for reading, since we can't be 100% sure that assignments to this struct will be atomic.

If we want lots of threads to read the read_only member, and none to write, to we need a semaphore on the struct access just for reading?

(I'm inclined to say no, because the fact that the locations immediately before and after are constantly changed shouldn't affect the read_only member, and multiple threads reading the value shouldn't interfere with each other. But I'm not sure.)


[Edit: I realize now I should have asked this question much better, in order to clarify very specifically what I meant. Naturally, I didn't really grok all of the issues involved when I first asked the question. Of course, if I comprehensively edit the question now, I will ruin all of these great answers. What I meant is more like:

struct bar {
  char written_frequently1[LONGISH_LEN];
  char read_only[LONGISH_LEN];
  char written_frequently2[LONGISH_LEN];
};

The major issue I asked about is, since this data is part of a struct, is it at all influenced by the other struct members, and might it influence them in return?

The fact that the members were ints, and therefore writes are likely atomic, is really just a red herring in this case.]

like image 846
JXG Avatar asked Nov 05 '08 16:11

JXG


People also ask

When should semaphore be used?

The correct use of a semaphore is for signaling from one task to another. A mutex is meant to be taken and released, always in that order, by each task that uses the shared resource it protects. By contrast, tasks that use semaphores either signal or wait—not both.

Why do we need semaphore?

Semaphores are typically used in one of two ways: To control access to a shared device between tasks. A printer is a good example. You don't want 2 tasks sending to the printer at once, so you create a binary semaphore to control printer access.

Does thread have access to global variables?

But to answer your question, any thread can access any global variable currently in scope. There is no notion of passing variables to a thread. It has a single global variable with a getter and setter function, any thread can call either the getter or setter at any time.

How can semaphores be used to share multiple resources?

Semaphores can also be used to control access to a pool of shared resources. The value of the semaphore indicates how many resources are available; a thread can P the semaphore to acquire a resource, use it, then V on the semaphore to return the resource to the pool.


3 Answers

Readers need mutexes, too!

There seems to be a common misconception that mutexes are for writers only, and that readers don't need them. This is wrong, and this misconception is responsible for bugs that are extremely difficult to diagnose.

Here's why, in the form of an example.

Imagine a clock that updates every second with the code:

if (++seconds > 59) {        // Was the time hh:mm:59?
   seconds = 0;              // Wrap seconds..
   if (++minutes > 59)  {    // ..and increment minutes.  Was it hh:59:59?
     minutes = 0;            // Wrap minutes..
     if (++hours > 23)       // ..and increment hours.  Was it 23:59:59?
        hours = 0;           // Wrap hours.
    }
}

If the code is not protected by a mutex, another thread can read the hours, minutes, and seconds variables while an update is in progress. Following the code above:

[Start just before midnight] 23:59:59
[WRITER increments seconds]  23:59:60
[WRITER wraps seconds]       23:59:00
[WRITER increments minutes]  23:60:00
[WRITER wraps minutes]       23:00:00
[WRITER increments hours]    24:00:00
[WRITER wraps hours]         00:00:00

The time is invalid from the first increment until the final operation six steps later. If a reader checks the clock during this period, it will see a value that may be not only incorrect but illegal. And since your code is likely to depend on the clock without displaying the time directly, this is a classic source of "ricochet" errors that are notoriously difficult to track down.

The fix is simple.

Surround the clock-update code with a mutex, and create a reader function that also locks the mutex while it executes. Now the reader will wait until the update is complete, and the writer won't change the values mid-read.

like image 118
Adam Liss Avatar answered Nov 03 '22 01:11

Adam Liss


If the read_only member is actually read only, then there is no danger of the data being changed and therefore no need for synchronization. This could be data that is set up before the threads are started.

You will want synchronization for any data that can be written, regardless of the frequency.

like image 41
Jonathan Adelson Avatar answered Nov 02 '22 23:11

Jonathan Adelson


"Read only" is a bit misleading, since the variable is written to at least once when it's initialized. In that case you still need a memory barrier between the initial write and subsequent reads if they're in different threads, or else they could see the uninitialized value.

like image 38
sk. Avatar answered Nov 03 '22 00:11

sk.