Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I make file I/O more transactional?

Tags:

io

haskell

I'm writing CGI scripts in Haskell. When the user hits ‘submit’, a Haskell program runs on the server, updating (i.e. reading in, processing, overwriting) a status file. Reading then overwriting sometimes causes issues with lazy IO, as we may be able to generate a large output prefix before we've finished reading the input. Worse, users sometimes bounce on the submit button and two instances of the process run concurrently, fighting over the same file!

What's a good way to implement

transactionalUpdate :: FilePath -> (String -> String) -> IO ()

where the function (‘update’) computes the new file contents from the old file contents? It is not safe to presume that ‘update’ is strict, but it may be presumed that it is total (robustness to partial update functions is a bonus). Transactions may be attempted concurrently, but no transaction should be able to update if the file has been written by anyone else since it was read. It's ok for a transaction to abort in case of competition for file access. We may assume a source of systemwide-unique temporary filenames.

My current attempt writes to a temporary file, then uses a system copy command to overwrite. That seems to deal with the lazy IO problems, but it doesn't strike me as safe from races. Is there a tried and tested formula that we could just bottle?

like image 308
pigworker Avatar asked Aug 14 '11 19:08

pigworker


Video Answer


2 Answers

The most idiomatic unixy way to do this is with flock:

  • http://hackage.haskell.org/package/flock
  • http://swoolley.org/man.cgi/2/flock
like image 100
sclv Avatar answered Sep 28 '22 17:09

sclv


Here is a rough first cut that relies on the atomicity of the underlying mkdir. It seems to fulfill the specification, but I'm not sure how robust or fast it is:

import Control.DeepSeq
import Control.Exception
import System.Directory
import System.IO

transactionalUpdate :: FilePath -> (String -> String) -> IO ()
transactionalUpdate file upd = bracket acquire release update
  where
    acquire = do
      let lockName = file ++ ".lock"
      createDirectory lockName
      return lockName
    release = removeDirectory
    update _ = nonTransactionalUpdate file upd

nonTransactionalUpdate :: FilePath -> (String -> String) -> IO ()
nonTransactionalUpdate file upd = do
  h <- openFile file ReadMode
  s <- upd `fmap` hGetContents h
  s `deepseq` hClose h
  h <- openFile file WriteMode
  hPutStr h s
  hClose h

I tested this by adding the following main and throwing a threadDelay in the middle of nonTransactionalUpdate:

main = do
  [n] <- getArgs
  transactionalUpdate "foo.txt" ((show n ++ "\n") ++)
  putStrLn $ "successfully updated " ++ show n

Then I compiled and ran a bunch of instances with this script:

#!/bin/bash                                                                                                     

rm foo.txt
touch foo.txt
for i in {1..50}
do
    ./SO $i &
done

A process that printed a successful update message if and only if the corresponding number was in foo.txt; all the others printed the expected SO: foo.txt.notveryunique: createDirectory: already exists (File exists).

Update: You actually do not want to use unique names here; it must be a consistent name across the competing processes. I've updated the code accordingly.

like image 27
acfoltzer Avatar answered Sep 28 '22 15:09

acfoltzer