Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the analogue of ConcurrentHashMap in Haskell?

update: please, bear in mind, I'm just started learning Haskell

Let's say we're writing an application with the following general functionality:

  • when starting, it gathers some data from an external source;
  • this data are a set of complex structures which contain lists, arrays, ints, string, etc.;
  • when running, the application serves web API (servlets) that provides access to the data.

Now, if the application would be written in Java, we could use static ConcurrentHashMap object where the data could be stored (representing Java classes). So that, during start, the app could fill the map with data, and then servlets could access it providing some API to the clients.

If the application would be written in Erlang, we could use ETS/DETS for storing the data (as native Erlang structures).

Now the question: what is the proper Haskell way for implementing such design? It shouldn't be DB, it should be some sort of a lightweight in-memory something, that could store complex structures (Haskell native structures), and that could be accessible from different threads (servlets, talking by Java-world entities). In Haskell: no static global vars as in Java, no ETS and OTP as in Erlang, - so how to do it the right way (with no using external solutions like Redis)?

Thanks

update: another important part of the question - since Haskell doesn't (?) have 'global static' variables, then what would be the right way for implementing this 'global accessible' data keeping object (say, it is "stm-containers")? Should I initialize it somewhere in the 'main' function and then just pass it to every REST API handler? Or is there any other more correct way?

like image 566
fycth Avatar asked Apr 19 '16 14:04

fycth


People also ask

What is the use of ConcurrentHashMap () in multithreading?

ConcurrentHashMap class is thread-safe i.e. multiple threads can operate on a single object without any complications. At a time any number of threads are applicable for a read operation without locking the ConcurrentHashMap object which is not there in HashMap.

How many threads can ConcurrentHashMap read?

By default ConcurrentHashMap has segment array size as 16 so simultaneously 16 Threads can put data in map considering each thread is working on separate Segment array index.

Why do we need ConcurrentHashMap?

You should use ConcurrentHashMap when you need very high concurrency in your project. It is thread safe without synchronizing the whole map . Reads can happen very fast while write is done with a lock. There is no locking at the object level.

How does ConcurrentHashMap achieve thread safety?

ConcurrentHashMap class achieves thread-safety by dividing the map into segments, the lock is required not for the entire object but for one segment, i.e one thread requires a lock of one segment. In ConcurrenHashap the read operation doesn't require any lock.


2 Answers

It's not clear from your question whether the client API will provide ways of mutating the data.

If not (i.e., the API will only be about querying), then any immutable data-structure will suffice, since one beauty of immutable data is that it can be accessed from multiple threads safely with you being sure that it can't change. No need for the overhead of locks or other strategies for working with concurrency. You'll simply construct the immutable data during the initialisation and then just query it. For this consider a package like "unordered-containers".

If your API will also be mutating the data, then you will need mutable data-structures, which are optimised for concurrency. "stm-containers" is one package, which provides those.

like image 187
Nikita Volkov Avatar answered Nov 10 '22 13:11

Nikita Volkov


First off, I'm going to assume you mean it needs to be available to multiple threads, not multiple processes. (The difference being that threads share memory, processes do not.) If that assumption is wrong, much of your question doesn't make sense.

So, the first important point: Haskell has mutable data structures. They can easily be shared between threads. Here's a small example:

import Control.Concurrent
import Control.Monad

main :: IO ()
main = do
    v <- newMVar 0 :: IO (MVar Int)
    forkIO . forever $ do
        x <- takeMVar v
        putMVar v $! x + 1
    forM_ [1..10] $ \_ -> do
        x <- readMVar v
        threadDelay 100
        print x

Note the use of ($!) when putting the value in the MVar. MVars don't enforce that their contents are evaluated. There's some subtlety in making sure everything works properly. You will get lots of space leaks until you understand Haskell's evaluation model. That's part of why this sort of thing is usually done in a library that handles all those details.

Given this, the first pass approach is to just store a map of some sort in an MVar. Unless it's under a lot of contention, that actually has pretty good performance properties.

When it is under contention, you have a good fallback secondary approach, especially when using a hash map. That's striping. Instead of storing one map in one MVar, use N maps in N MVars. The first step in a lookup is using the hash to determine which of the N MVars to look in.

There are fancy lock-free algorithms, which could be implemented using finer-grained mutable values. But in general, they are a lot of engineering effort for a few percent improvement in performance that doesn't really matter in most use cases.

like image 26
Carl Avatar answered Nov 10 '22 11:11

Carl