Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Software transactional memory with a big, shared piece of data

Tags:

haskell

stm

The original question

I'm new to STM. One thing I'd like to do in Haskell involves a big piece of data, and lots of lightweight threads reading and writing to small parts of said big piece of data. The locations read from and written to can be considered essentially random and small. STM seems great for this, but I have some questions regarding how to attack the problem. I see several possible approaches, each with some drawbacks, and some that seem plain stupid. Some comments on these, or other ideas for alternatives, would be much appreciated.

Let's for simplicity assume that the big piece of data is a Data.Vector a, where the elements are themselves small.

  1. The entire piece of data as one TVar (Vector a). I guess this will cause lots of copying of the huge piece of data, as STM will think each individual write has possibly affected the entire shared data. Surely there's no magic going on where STM identifies that the reads and writes are very localized, and that consistency across the big piece of data is not required?

  2. A huge amount of TVar as, essentially one for each element, giving fully localized STM, but essentially duplicating the entire Vector a. This seems stupid.

  3. A compromise between 1 and 2 by segmenting the data so that I have a reasonable number of TVar (Vector a)s corresponding to subvectors of the data. I feel this solution depends too much on heuristics, like how big the segments should be.

  4. Messaging. Instead of each worker reading and writing to the data using STM, each writes messages with requests for data to be read or chunks of data to be written through some STM mechanism, such as a TChan. A special thread receives these messages, passing out data requested through another TChan, or taking received data and writing it to the shared data structure. This solution seems free of the problems plaguing solutions 1-3, but it also seems to me that it essentially gives up on using the niceties of STM to keep the data consistent. Rather, it's just message passing. Granted, the message passing part is implemented with STM, but my real problem is solved with message passing. STM seems so great, message passing is so... meh.

Am I thinking about this correctly? Does anybody have any hints or other suggestions?

Keep in mind that I have no experience with STM, nor have I tried the above solutions yet. I'll get out of my armchair, but sometimes it can be good to think about these things before trying anything.

Addendum: A fifth approach comes from Nathan Howell and uses TArray. It sounds like what I want, but the documentation says:

It is currently implemented as Array ix (TVar e), but it may be replaced by a more efficient implementation in the future (the interface will remain the same, however).

I take this to mean that TArray is just my approach number 2 in nicer clothes. The docs hinting about "a more efficient" implementation is interesting, as it hints of there actually being a nicer approach.

Follow-up on Vagif Verdi's answer

Vagif Verdi's answer is very interesting, so I made a little experiment to try it out. I don't have time to slim down the code right now, so those interested in this will have to bear with me on the code not just containing the bare essentials. I decided to use a mutable vector with 10^8 Ints as the "big shared piece of data", and let the multiple readers/writers correspond to threads taking into on a network socket.

Note that the code does not even read or write to the shared piece of data just. It's simply there, and each thread holds a TVar to it.

So what happens? I run the program, and immediately it takes up about 780 MB of RAM, which is to be expected (it's roughly what the 10^8 Ints need). Now, if I use netcat to connect a few clients and write a bit of text, which the program is just supposed to print out and not even write to the shared data, the process' CPU usage spikes to 100% for upwards of a second! There's a noticeable lag before the text is displayed. On the bright side, memory usage stays constant (as per Vagif Verdi's answer). If I remove the vector and the TVar, i.e. take out all STM and shared data, everything is very quick and responsive, and there's no noticeable CPU usage whenever a client writes something.

So, while it's very nice to see that the shared data is in fact not duplicated (although I suppose I should actually write to the shared data in order to verify this fully), there's a very heavy performance penalty involved in maintaining coherent state. For me, my question then remains: How should one properly attack this problem while keeping the niceties of STM?

Thanks to Vagif Verdi for bringing up some very interesting points.

like image 434
gspr Avatar asked Mar 03 '12 15:03

gspr


People also ask

How does software transactional memory work?

Software transactional memory (STM) is a method of concurrency control in which shared-memory accesses are grouped into transactions which either succeed or fail to commit in their entirety.

What is software transaction?

In computer science, software transactional memory (STM) is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. It is an alternative to lock-based synchronization.

What is transactional memory in computer architecture?

Transactional memory is a model for controlling concurrent memory accesses in the scope of parallel programming. In parallel programming, concurrency control ensures that threads running in parallel do not update the same resources at the same time.

What is transactional memory in c++?

Trainer at Modernes C++ Transactional memory is based on the idea of a transaction from the database theory. Transactional memory shall make the handling with threads a lot easier. That for two reasons. Data races and deadlocks disappear. Transactions are composable.


1 Answers

STM isn't magic. If you have one giant TVar it is going to have to block all threads but one at any given time. This is equivalent to "coarse grained locking" and has the benefit that it is easy to use, but the whole point of STM is to not be forced into using coarse grained locking. Effectively only one thread can write at a time with this approach. The same goes with your "message passing" system--one thread is a bottleneck limiting scalability.

I don't think there is a big problem with using an array of TVars and I don't know why you describe your approach 2 as "stupid". This is exactly what STM was invented to do.

Edit: I encourage interested parties to watch this video, or at least the beginning of it, for a discussion of some of the motivation for STM. It is a few years old, and the stuff on Transactional Boosting isn't really relevant, but Herlihy is brilliant and one of the Computer Scientists who manages to make an area interesting even if it isn't your thing.

like image 141
Philip JF Avatar answered Oct 02 '22 16:10

Philip JF