Running a Haskell script on a machine without GHC

Tags:

This question may or may not be truly Haskell-specific, but it concerns a slight annoyance that I am facing with a certain programming task.

I have written a program in Haskell which is mostly universal for the type of problem I am trying to solve, but includes two dependent components: a run-time estimation function for a script, calculated based on trial runs at a certain benchmark, and a file-name conversion function, which is tailored to the naming scheme of the files I was working with. Naturally, if I want to use the script with performances other than the benchmark, or I find that the estimates are too conservative, I would like to change the function used to estimate the run-time, and likewise I would like to be able to modify the file-name conversion function if I ever need to work with different files with different naming schemes.

However, the (remote) computer that I am running my scripts on does not have GHC or runhaskell installed, so I am having to modify, compile, and re-upload the code from my local machine, which is a bit of a hassle. My question is, is there an easy way to implement changes in some components of my code without having to recompile in order for the changes to be reflected at call-time?

I apologize if my description is vague, and have included the gory details below, as I do not want to clutter my question with unnecessary details from the outset, should the details prove unnecessary.

I am writing this code in Haskell mainly because that is the language that I best know how to implement the methods in; while I understand that other languages might be more portable, I am not sufficiently familiar with other languages in order to implement this without having to read a lot of documentation and go through multiple revisions in order to get it to work. If achieving the flexibility I want with Haskell is impractical, I can appreciate that, but I would rather know that Haskell cannot do it than receive suggestions of other languages that can.

Specific Details

I am writing code to run independent jobs on a load-sharing cluster, and I therefore want to most closely estimate the time required for a particular job, without under-shooting and causing the job to be terminated, and without over-shooting and thereby lowering the priority of the jobs. I am basing my time estimate on the size of the inputs to the job program, and I have gathered enough real-world data to derive an approximate quadratic relation between size and time.

The way I am currently assigning time-estimates, and thereby establishing a job order, for the inputs is by parsing the output of du with a Haskell script, performing a computation, and writing the time results to a new file, which is then read in a loop by the job-assignment script.

The job is being run for paired files, which share a common name up to a certain point, where the last common element I wish to retain is an 's', with no further 's' characters in either name from then on. Therefore, I am traversing the names backwards and dropping until I reach an 's'. My code is included below. It is liberal with comments, which might help or might confuse. Some of them are highly specific to the task I am working with.

-- size2time.hs
-- A Haskell script to convert file sizes into job-times, based on observed job-times for
-- various file sizes
--
--
-- This file may be compiled via the following command:
-- > ghc size2time.hs
--
-- Should any edits be made, ensure that the compiled executable is updated accordingly
--
-- The executable is to be run with the following usage
--
-- > ./size2time inputfile outputfile
--
-- where inputfile is the name of a file whose first column contains the sizes, in MB, of each fq.gz 
-- (including both paired-end reads), and whose second column contains the corresponding file names, as
-- generated by
-- 
-- > du -m $( ls DIR/*.fq.gz ) >inputfile
--
-- where DIR is the directory containing the fq.gz files
--
-- output is the name of a file that will be created by the execution of this script, whose first
-- column will contain the run-time, in minutes, of the corresponding job (the times are based on
-- jobs run on Intel CPUs with 12 cores and 2GB of RAM, and therefore will potentially be
-- inapplicable to jobs run on CPUs of different manufacturers, with different numbers of cores,
-- and/or with different allocated RAM), and whose second column contains the scrubbed names of
-- the jobs to be run. The greater time-value for any given pair is used, with only one member of
-- each pair retained, as the file-names of each member of a pair are identical after scrubbing
--

-- import modules for command line arguments, list operations, map operations
import System.Environment
import Data.List
import qualified Data.Map as Map


main = do
    args <- getArgs -- parse command line arguments: inputfile, outputfile, <ignored>
    let infile = head args
    outfile = head . tail $ args
    contents <- readFile infile -- read the inputfile
    let sf = lines contents -- split into lines
        tf = map size2time sf -- peform size2time mapping
        st = map sample tf -- scrub filename
        stu = Map.toList . Map.fromListWith (max) $ st -- take only the longer of the two times of the paired reads
        tsu = map flip2 stu -- put time first
        stsu = sort tsu -- sort by time, ascending
        tsustr = map unwords . map (\(x,y) -> [show x, y]) $ stsu -- convert back to string
        tsulns = unlines tsustr -- join individual lines
    writeFile outfile tsulns -- write to the outputfile


{- given a string, with the size of a file and the name of the file,
 - returns a tuple with the estimated job-time and the unmodified name
 - of the file.
 -
 - The size-time conversion is extrapolated from experimental data,
 - with only the upper extremes considered in order to prevent timeout,
 - rounding in the quadratic term, and a linear-degree time padding added
 - to allow for upper extremes. If modifications are to be made to any
 - coefficients, it is recommended that only linear and constant terms be increased,
 - and decreases should only be made after performing sufficient alignments to collect
 - enough (file size)--(actual computation time) pairs to verify that the padding is excessive,
 - and to determine coefficients that more closely follow the trend of the actual data, with
 - the conditions that no data point must exceed the approximation curve, and that sufficient padding
 - must be provided to allow for potential inconsistency in the time required for any given size of alignment.
 -}
size2time :: String -> (Int,String)
size2time sfstring = let (size:file:[]) = words sfstring -- parses out size and filename
                         x = fromIntegral (read size :: Int) -- floating point from numeric string
             time = floor $ 0.000025 * x ^ 2 + 0.03 * x + 10 -- apply floored conversion
             tfstring = (time,file)
             in tfstring



{-
 - removes all characters in the file-name after 's', which properly scrubs files of the format
 - *--Hs--R?.fq.gz, where the ? is either 1 or 2. For filenames formatted in different ways,
 - or for alternative naming of the BAM file to be generated, this function must be modified
 - to suit the scenario.
 -}
sample :: (a,String) -> (String,a)
sample (x,f) = let s = reverse . dropWhile (/= 's') . reverse $ f
               in (s,x)

{-
 - Reverses the order of a tuple, e.g. so that a Map may be made with a key to be found in the 
 - current second position of the tuple.
 -}
flip2 :: (a,b) -> (b,a)
flip2 (x,y) = (y,x)

967

asked Sep 26 '14 14:09

archaephyrryx

1 Answers

I don't think there's a clear solution to your problem.

Without an interpreter or compiler on the remote machine, it's not possible to modify your Haskell source on that machine and then convert it into a machine-readable form.

As others have said, perhaps you could implement configuration files or command line options that allow likely-to-be-modified data to be specified at run time.

Or, assuming your remote machine has gcc installed, you could have GHC compile your Haskell code into C on your local machine, transfer it to the remote machine, try your best to make sense of how it translated your code, and make changes to the C code and recompile on the remote machine.

183

answered Sep 21 '22 00:09

user2752467

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Running a Haskell script on a machine without GHC

Tags:

compilation

haskell

portability

ghc

Specific Details

archaephyrryx

People also ask

1 Answers

user2752467

Recent Activity

Donate For Us

Running a Haskell script on a machine without GHC

Tags:

compilation

haskell

portability

ghc

Specific Details

archaephyrryx

People also ask

1 Answers

user2752467

Related questions

Recent Activity

Donate For Us