Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sorting a vector using threads

Are the vectors defined in the C++ STL re-entrant or thread safe? Can I use two threads and work(in this case sort) on two halfs of a vector without using a mutex? For example:

int size = (Data_List.size())/2;

Thread A()
{

................ // Do we need to lock Data_list with a mutex
 sort(Data_List.begin(),Data_List.begin()+size,cmp);
}

Thread B()
{
....// Do we need to lock Data_list with a mutex
sort(Data_List.begin()+(size+1),Data_List.end(),cmp);
}

My Question is do we need to lock the access of Data_List using a mutex?

Note: the cmp function is a regular int comparision function.

like image 489
tomkaith13 Avatar asked Dec 09 '22 17:12

tomkaith13


2 Answers

As long as the threads are working on different regions of memory and your comparison function only works with that memory and local variables, you should be ok. Essentially, you are "locking" each half of the table by dividing the work between the threads and only letting the thread work on its half of the data.

like image 71
tvanfosson Avatar answered Dec 17 '22 20:12

tvanfosson


Nearly, but not quite. This code is in general not thread-safe, even assuming the implementation makes reasonable guarantees about the thread-safety of vector.

Suppose Data_List::value_type is a type for which your architecture does not provide atomic access. For example on x86, byte-writes and aligned word- and dword-writes are atomic, but short-writes are not, and writes of user-defined types aren't unless they happen to be a good size and alignment. If your UDT has size 3, you could have a problem.

To perform a non-atomic short write, an implementation might do two byte writes. Or it might load the word, modify two of the bytes, and write the word back. On architectures where a byte write is expensive, the latter is quite plausible.

Now, suppose your implementation does the latter. Suppose further that your division of the vector happens to leave the first half of a word in one portion, and the second half of the word in the other. Suppose finally that the two different threads both try to modify that word simultaneously. This can go wrong - they might both read the same value, both modify it, but then one write back will occur before the other, so one of the modifications will be overwritten and lost.

Atomic byte-access by default isn't universal, and I doubt any implementation does atomic bit-access by default. So a vector<bool> is a possible problem even if other vector types are safe: your division of the vector could go down the middle of a byte. Not that you'd generally sort bools this way, but there are other operations where you might try to divide vectors, or your code might be generic.

You can usually expect word-access to be atomic (and in C or C++, int access). But it's not guaranteed by the language standard, hence sig_atomic_t. You say your cmp is an int comparison function, which suggests that maybe your vector contains ints. But you can compare shorts perfectly well using an int comparison function, so I don't know.

What you actually need to do is check your implementation/platform very carefully, and see what it says about safely accessing memory from multiple threads. It's probably fine for a vector of int.

like image 23
Steve Jessop Avatar answered Dec 17 '22 19:12

Steve Jessop