Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How dangerous is forkProcess? How can I use it safely?

Tags:

fork

haskell

ghc

I’d like to play tricks with forkProcess, where I want to clone my Haskell process, and then let both clones talk to each other (maybe using Cloud Haskell to send even closures around).

But I wonder how well that works with the GHC runtime. Does anyone have experience here?

The documenation for forkProcess says that no other threads are copied, so I assume all data used by other threads will then be garbage collected in the fork, which sounds good. But that means that finalizers will run in both clone, which may or may not be the right thing to do…

I assume I can’t just use it without worry; but are there rules I can follow that will make sure its use is safe?

like image 673
Joachim Breitner Avatar asked Nov 12 '20 17:11

Joachim Breitner


People also ask

Is fork thread safe?

It's safe to fork in a multithreaded program as long as you are very careful about the code between fork and exec. You can make only re-enterant (aka asynchronous-safe) system calls in that span.

What happens if a thread calls fork?

fork creates a new process. The parent of a process is another process, not a thread. So the parent of the new process is the old process. Note that the child process will only have one thread because fork only duplicates the (stack for the) thread that calls fork .

How does fork work?

In the computing field, fork() is the primary method of process creation on Unix-like operating systems. This function creates a new copy called the child out of the original process, that is called the parent. When the parent process closes or crashes for some reason, it also kills the child process.

Can a child process fork?

The fork() creates a child process by duplicating the calling process. The process that invoked fork() is the parent process and the newly created process is the child process.


1 Answers

But that means that finalizers will run in both clone, which may or may not be the right thing to do…

Finalizers are very rarely used in Haskell, and even where they are used, I would expect them to only have in-process effects. For example, a finalizer calls hClose on garbage-collected Handles if you forgot to do it yourself. This is easy to demonstrate: the following program fails with openFile: resource exhausted (Too many open files), but if you uncomment the pure (), the Handles get garbage-collected and the program completes successfully.

import Control.Concurrent
import Control.Monad
import System.IO
import System.Mem

main :: IO ()
main = do
  rs <- replicateM 1000 $ do
    threadDelay 1000  -- not sure why did is needed; maybe to give control back
                      -- to the OS, so it can recycle the file descriptors?
    performGC
    openFile "input" ReadMode
    --pure ()
  print rs  -- force all the Handles to still be alive by this point

File descriptors are process-owned and are copied by forkProcess, so it makes sense to have each clone close their copies.

The case which would be problematic is if a finalizer was cleaning up a system-owned resource, e.g. deleting a file. But I hope no library is relying on finalizers to delete such resources, because as the documentation explains, finalizers are not guaranteed to run. So it's better to use something like bracket to cleanup resources (although the cleanup is still not guaranteed, e.g. if bracket is used from a thread).

What the documentation for forkProcess is warning about is not finalizers, but the fact that other threads will appear to end abruptly inside the forked process. This is especially problematic if those threads are holding locks. Normally, two threads can use modifyMVar_ to ensure that only one thread at a time is running a critical section, and as long as each thread is only holding the lock for a finite amount of time, the other thread can simply wait for the MVar to become available. If you call forkProcess while one thread is in the middle of a modifyMVar_, however, that thread will not continue in the cloned process, and so the cloned process cannot simply call modifyMVar_ or it could get stuck forever while waiting for a non-existing thread to release the lock. Here is a program demonstrating the problem.

import Control.Concurrent
import Control.Monad
import System.Posix.Process

-- >>> main
-- (69216,"forkIO thread",0)
-- (69216,"main thread",1)
-- (69216,"forkIO thread",2)
-- (69216,"main thread",3)
-- (69216,"forkIO thread",4)
-- (69216,"main thread",5)
-- calling forkProcess
-- forkProcess main thread waiting for MVar...
-- (69216,"forkIO thread",6)
-- (69216,"original main thread",7)
-- (69216,"forkIO thread",8)
-- (69216,"original main thread",9)
-- (69216,"forkIO thread",10)
-- (69216,"original main thread",11)
main :: IO ()
main = do
  mvar <- newMVar (0 :: Int)
  _ <- forkIO $ replicateM_ 6 $ do
    modifyMVar_ mvar $ \i -> do
      threadDelay 100000
      processID <- getProcessID
      print (processID, "forkIO thread", i)
      pure (i+1)
  threadDelay 50000
  replicateM_ 3 $ do
    modifyMVar_ mvar $ \i -> do
      threadDelay 100000
      processID <- getProcessID
      print (processID, "main thread", i)
      pure (i+1)
  putStrLn "calling forkProcess"
  _ <- forkProcess $ do
    threadDelay 25000
    replicateM_ 3 $ do
      putStrLn "forkProcess main thread waiting for MVar..."
      modifyMVar_ mvar $ \i -> do
        threadDelay 100000
        processID <- getProcessID
        print (processID, "forkProcess main thread", i)
        pure (i+1)
  replicateM_ 3 $ do
    modifyMVar_ mvar $ \i -> do
      threadDelay 100000
      processID <- getProcessID
      print (processID, "original main thread", i)
      pure (i+1)
  threadDelay 100000

As the output shows, the forkProcess main thread gets stuck waiting forever for the MVar, and never prints the forkProcess main thread line. If you move the threadDelays outside the modifyMVar_ critical section, the forkIO thread is a lot less likely to be in the middle of that critical section when forkProcess is called, so you'll see an output which looks like this instead:

(69369,"forkIO thread",0)
(69369,"main thread",1)
(69369,"forkIO thread",2)
(69369,"main thread",3)
(69369,"forkIO thread",4)
(69369,"main thread",5)
calling forkProcess
(69369,"forkIO thread",6)
(69369,"original main thread",7)
forkProcess main thread waiting for MVar...
(69370,"forkProcess main thread",6)
(69369,"forkIO thread",8)
(69369,"original main thread",9)
forkProcess main thread waiting for MVar...
(69370,"forkProcess main thread",7)
(69369,"forkIO thread",10)
(69369,"original main thread",11)
forkProcess main thread waiting for MVar...
(69370,"forkProcess main thread",8)

After the forkProcess call, there are now two MVars which both hold the value 5, and so in the original process, original main thread and forkIO thread are both incrementing one MVar, while in the other process forkProcess main thread is incrementing the other.

like image 138
gelisam Avatar answered Nov 15 '22 04:11

gelisam