Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multithreaded linked list traversal

Given a (doubly) linked list of objects (C++), I have an operation that I would like multithread, to perform on each object. The cost of the operation is not uniform for each object. The linked list is the preferred storage for this set of objects for a variety of reasons. The 1st element in each object is the pointer to the next object; the 2nd element is the previous object in the list.

I have solved the problem by building an array of nodes, and applying OpenMP. This gave decent performance. I then switched to my own threading routines (based off Windows primitives) and by using InterlockedIncrement() (acting on the index into the array), I can achieve higher overall CPU utilization and faster through-put. Essentially, the threads work by "leap-frog'ing" along the elements.

My next approach to optimization is to try to eliminate creating/reusing the array of elements in my linked list. However, I'd like to continue with this "leap-frog" approach and somehow use some nonexistent routine that could be called "InterlockedCompareDereference" - to atomically compare against NULL (end of list) and conditionally dereference & store, returning the dereferenced value.

I don't think InterlockedCompareExchangePointer() will work since I cannot atomically dereference the pointer and call this Interlocked() method. I've done some reading and others are suggesting critical sections or spin-locks. Critical sections seem heavy-weight here. I'm tempted to try spin-locks but I thought I'd first pose the question here and ask what other people are doing. I'm not convinced that the InterlockedCompareExchangePointer() method itself could be used like a spin-lock. Then one also has to consider acquire/release/fence semantics...

Ideas? Thanks!

like image 762
Rob Bryce Avatar asked Nov 04 '22 15:11

Rob Bryce


2 Answers

Critical sections are not really heavy weight. Assuming they can acquire the lock quickly they act like a spin lock.

The solution to your problem will depend on whether or not you are modifying the list. If you aren't modifying the list all you need to do is something like InterlockedCompareExchange on a value in your object initialized to 0. The exchange value is 1, if you get 0 back then you do your actions, if you get 1 back, you skip. The next time you do actions on the list you exchange in 2 and check for 1/2 instead of 0/1.

If you are changing the list then it all depends. If you want to only move forward, and only delete current nodes then your best bet would be to have a next lock in the item which you lock when doing the compare exchange bit, when leap-frogging (getting it's next node) and when deleting the node.

like image 50
CrazyCasta Avatar answered Nov 15 '22 04:11

CrazyCasta


I think Casta has some good points, and I'll pony up one myself. The solution to this problem is going to weight heavily on the idealogical transformations done at each node, whether they're interdependent, and whether the accomplished task can be done in a singular sweep of the list.

If the operations are not interdependent, and the sweep-approach makes sense, I propose a brokered fetch system. It is virtually identical to a work-crew approach. Each thread is handed a reference to the broker, which manages the list, As each thread needs content to process it requests it from the broker, which locks, obtains the next node, advances the internal sweep iterator, and releases the lock, returning the node to the requesting thread. This continues until the list enumeration is finished. For accumulator problems where each node may be visited more than once by a different thread, you can use a circular list or some other such container. A talented broker can manage the list, including new inserts, deletes, etc, in the same fashion as it dispatches: lock, action, unlock. There are obviously many activities that can be tailored to your specific needs on the broker side. What those needs are can make for an elaborate pool management system, while still be very efficient in terms of locking contention (i.e. almost none).

Bottom line, however, is hopefully evident. Know your problem set and the specifics of what you want each thread to accomplish with its current node.

like image 20
WhozCraig Avatar answered Nov 15 '22 05:11

WhozCraig