Design of a high-performance sorted data structure read by many threads and written by few

Tags:

I have an interesting data structure design problem that is beyond my current expertise. I'm seeking data structure or algorithm answers about tackling this problem.

The requirements:

Store a reasonable number of (pointer address, size) pairs (effectively two numbers; the first is useful as a sorting key) in one location
In a highly threaded application, many threads will look up values, to see if a specific pointer is within one of the (address, size) pairs - that is, if the pair defines a memory range, if the pointer is within any range in the list. Threads will much more rarely add or remove entries from this list.
Reading or searching for values must be as fast as possible, happening hundreds of thousands to millions of times a second
Adding or removing values, ie mutating the list, happens much more rarely; performance is not as important
It is acceptable but not ideal for the list contents to be out of date, ie for a thread's lookup code to not find an entry that should exist, so long as at some point the entry will exist.

I am keen to avoid a naive implementation such as having a critical section to serialize access to a sorted list or tree. What data structures or algorithms might be suitable for this task?

Tagged with Delphi since I am using that language for this task. Language-agnostic answers are very welcome.

However, I probably cannot use any of the standard libraries in any language without a lot of care. The reason is that memory access (allocation, freeing, etc of objects and their internal memory, eg tree nodes, etc) is strictly controlled and must go through my own functions. My current code elsewhere in the same program uses red/black trees and a bit trie, and I've written these myself. Object and node allocation runs through custom memory allocation routines. It's beyond the scope of the question, but is mentioned here to avoid an answer like 'use STL structure foo.' I'm keen for an algorithmic or structure answer that, so long as I have the right references or textbooks, I can implement myself.

395

asked Nov 28 '13 13:11

David

2 Answers

I would use a TDictionary<Pointer, Integer> (from Generics.Collections) combined with a TMREWSync (from SysUtils) for the multi-read exclusive-write access. TMREWSync allows multiple readers simulatenous access to the dictionary, as long as no writer is active. The dictionary itself provides O(1) lookup of pointers.

If you don't want to use the RTL classes the answer becomes: use a hash map combined with a multi-read exclusive-write synchronization object.

EDIT: Just realized that your pairs really represent memory ranges, so a hash map does not work. In this case you could use a sorted list (sorted by memory adress) and then use binary search to quickly find a matching range. That makes the lookup O(log n) instead of O(1) though.

141

answered Nov 06 '22 12:11

jpfollenius

Exploring a bit the replication idea ...

From the correctness point of view, reader/writer locks will do the work. However, in practice, while readers may be able to proceed concurrently and in parallel with accessing the structure, they will create a huge contention on the lock, for the obvious reason that locking even for read access involves writing to the lock itself. This will kill the performance in a multi-core system and even more in a multi-socket system.

The reason for the low performance is the cache line invalidation/transfer traffic between cores/sockets. (As a side note, here's a very recent and very interesting study on the subject Everything You Always Wanted to Know About Synchronization but Were Afraid to Ask ).

Naturally, we can avoid inter core cache transfers, triggered by readers, by making a copy of the structure on each core and restricting the reader threads to accessing only the copy local to the core they are currently executing. This requires some mechanism for a thread to obtain its current core id. It also relies to on the operating system scheduler to not move gratuitously threads across cores, i.e. to maintain core affinity to some extent. AFACT, most current operating systems do it.

As for the writers, their job would be to update all the existing replicas, by obtaining each lock for writing. Updating one tree (apparently the structure should be some tree) at a time does mean a temporary inconsistency between replicas. From the problem description this seams acceptable. When a writer works, it will block readers on a single core, but not all readers. The drawback is that a writer has the perform the same work many times - as many time as there are cores or sockets in the system.

PS.

Maybe, just maybe, another alternative is some RCU-like approach, but I don't know this well, so I'll just stop after mentioning it :)

answered Nov 06 '22 12:11

chill

Related questions
                            
                                span and parallel loop
                            
                                When my timer ticks.... .NET Memory Leak
                            
                                In Apple's Cocoa API, why is it important that NSApplicationMain be called from the main thread?
                            
                                Threads vs Processes in .NET
                            
                                Remove an item from the middle of a Queue?
                            
                                interrupt() doesn't work
                            
                                Java Swingworker and multiple threads
                            
                                When off the main thread, how can I get some code to run on the main thread as quickly as possible?
                            
                                Frameworks for Unittests for Multithreading in Java [closed]
                            
                                False Sharing and Atomic Variables
                            
                                Thread Pool vs Many Individual Threads
                            
                                What perl built in functions are atomic?
                            
                                Why is my program faster with one core not two core?
                            
                                Poor performance from SQLite, big writes bring little reads to a crawl
                            
                                `getErrno` and threads
                            
                                Does ASP.NET continue reliably processing a request even after a user is navigated away via javascript?
                            
                                Prevent context-switching in timed section of code (or measure then subtract time not actually spent in thread)
                            
                                Thread.interrupt() in Java: what's the point? [duplicate]
                            
                                Library for Asserting Multi Thread Code in Java
                            
                                Actor calculation model using boost::thread

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Design of a high-performance sorted data structure read by many threads and written by few

Tags:

algorithm

data-structures

multithreading

thread-safety

delphi

David

People also ask

2 Answers

jpfollenius

chill

Recent Activity

Donate For Us