I have a large files where I store Binary
data. There are multiple threads reading and writing these files, an my current design synchronizes them using a single Lock
. This way, I have only one Handle
in ReadWriteMode
open for a file, and all threads fight for that single lock when they feel like doing some I/O.
I'd like to improve upon this by allowing multiple readers to work concurrently. What I tried was using a RWLock
and having multiple Handles open. The RWLock
would ensure that only one thread modifies the file, while many threads (as many as I have handles open, a compile-time constant) are allowed to read concurrently. When trying to run this, I was hit by the fact that the runtime allows only one Handle
in ReadWriteMode
to exist for a file at any time.
How can I resolve this situation? I assume obtaining / releasing a Handle
is an expensive operation, so just opening the file in the appropriate mode after acquiring the RWLock
is not really an option. Or maybe there is a package offering an API similar to Java FileChannel
's read and write methods?
PS: I'd like to support 32bit architectures, so memory-mapped IO is not possible for files > 4GiB, right?
Concurrent Haskell is the name given to GHC's concurrency extension. It is enabled by default, so no special flags are required. The Concurrent Haskell paper is still an excellent resource, as is Tackling the awkward squad.
Typically Haskell threads are an order of magnitude or two more efficient (in terms of both time and space) than operating system threads. The downside of having lightweight threads is that only one can run at a time, so if one thread blocks in a foreign call, for example, the other threads cannot continue.
You should build an type around the file handle and a mutex lock. Here's a simple implementation that I think would work for your purposes.
module SharedHandle (SharedHandle, newSharedHandle, withSharedHandle) where
import Control.Concurrent.MVar
import System.IO
data SharedHandle = SharedHandle Handle (MVar ())
newSharedHandle :: IO Handle -> IO SharedHandle
newSharedHandle makeHandle = do
handle <- makeHandle
lock <- newMVar()
return $ SharedHandle handle lock
withSharedHandle :: SharedHandle -> (Handle -> IO a) -> IO a
withSharedHandle (SharedHandle handle lock) operation = do
() <- takeMVar lock
val <- operation handle
putMVar lock ()
return val
What's doing on here is I've created a new datatype which is, at it's essence, just a file handle. The only difference is that it also comes with its own individual mutex lock implemented with an MVar. I have provided two functions for operating on this new type. newSharedHandle takes a operation that would create a normal handle and created a shared handle with a fresh lock. withSharedHandle takes an operation for operating on handles, locks the shared handle, performs the operation, and then unlocks the handle. Notice that the constructor or accessors are not provided from the module so we can be assured no process ever forgets to free the lock and we never get deadlocks on one particular access.
Replacing all file handles in your program with this new type could solve your problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With