Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concurrent File reads / writes in Haskell?

I have a large files where I store Binary data. There are multiple threads reading and writing these files, an my current design synchronizes them using a single Lock. This way, I have only one Handle in ReadWriteMode open for a file, and all threads fight for that single lock when they feel like doing some I/O.

I'd like to improve upon this by allowing multiple readers to work concurrently. What I tried was using a RWLock and having multiple Handles open. The RWLock would ensure that only one thread modifies the file, while many threads (as many as I have handles open, a compile-time constant) are allowed to read concurrently. When trying to run this, I was hit by the fact that the runtime allows only one Handle in ReadWriteMode to exist for a file at any time.

How can I resolve this situation? I assume obtaining / releasing a Handle is an expensive operation, so just opening the file in the appropriate mode after acquiring the RWLock is not really an option. Or maybe there is a package offering an API similar to Java FileChannel's read and write methods?

PS: I'd like to support 32bit architectures, so memory-mapped IO is not possible for files > 4GiB, right?

like image 768
Waldheinz Avatar asked Apr 30 '14 18:04

Waldheinz


People also ask

Is Haskell good for concurrency?

Concurrent Haskell is the name given to GHC's concurrency extension. It is enabled by default, so no special flags are required. The Concurrent Haskell paper is still an excellent resource, as is Tackling the awkward squad.

What is a thread Haskell?

Typically Haskell threads are an order of magnitude or two more efficient (in terms of both time and space) than operating system threads. The downside of having lightweight threads is that only one can run at a time, so if one thread blocks in a foreign call, for example, the other threads cannot continue.


1 Answers

You should build an type around the file handle and a mutex lock. Here's a simple implementation that I think would work for your purposes.

module SharedHandle (SharedHandle, newSharedHandle, withSharedHandle) where

import Control.Concurrent.MVar
import System.IO              

data SharedHandle = SharedHandle Handle (MVar ())

newSharedHandle :: IO Handle -> IO SharedHandle
newSharedHandle makeHandle = do
    handle <- makeHandle
    lock <- newMVar()
    return $ SharedHandle handle lock

withSharedHandle :: SharedHandle -> (Handle -> IO a) -> IO a
withSharedHandle (SharedHandle handle lock) operation = do
    () <- takeMVar lock
    val <- operation handle
    putMVar lock ()
    return val

What's doing on here is I've created a new datatype which is, at it's essence, just a file handle. The only difference is that it also comes with its own individual mutex lock implemented with an MVar. I have provided two functions for operating on this new type. newSharedHandle takes a operation that would create a normal handle and created a shared handle with a fresh lock. withSharedHandle takes an operation for operating on handles, locks the shared handle, performs the operation, and then unlocks the handle. Notice that the constructor or accessors are not provided from the module so we can be assured no process ever forgets to free the lock and we never get deadlocks on one particular access.

Replacing all file handles in your program with this new type could solve your problem.

like image 145
mmachenry Avatar answered Sep 21 '22 15:09

mmachenry