I have a piece of code that process files,
processFiles :: [FilePath] -> (FilePath -> IO ()) -> IO ()
This function spawns an async process that execute an IO action. This IO action must be submitted to a cluster through a job scheduling system (e.g Slurm).
Because I must use the job scheduling system, it's not possible to use cloudHaskell to distribute the closure. Instead the program writes a new Main.hs containing the desired computations, that is copy to the cluster node together with all the modules that main depends on and then it is executed remotely with "runhaskell Main.hs [opts]". Then the async process should ask periodically to the job scheduling system (using threadDelay) if the job is done.
Is there a way to avoid creating a new Main? Can I serialize the IO action and execute it somehow in the node?
Yep. There is a magical library called packman
. It allows you to turn any haskell thing into data (as long as it does not have IORef
s or related things in them.) Here the things you would need:
trySerialize :: a -> IO (Serialized a)
deserialize :: Serialized a -> IO a
instance Typeable a => Binary (Serialized a)
Yep, those are the exact types. You can package up your IO
actions using trySerialize
, use Binary
to transfer it to wherever, and then deserialize
to get the IO action out, ready for use.
Caveats for packman
is that:
Binary
will probably be huge. Evaluating the thunk can fix this.Other than that, this seems like what you want!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With