Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dealing with IO vs pure code in haskell

Tags:

io

haskell

I'm writing a shell script (my 1st non-example in haskell) which is supposed to list a directory, get every file size, do some string manipulation (pure code) and then rename some files. I'm not sure what i'm doing wrong, so 2 questions:

  1. How should i arrange the code in such program?
  2. I have a specific issue, i get the following error, what am i doing wrong?
error:
    Couldn't match expected type `[FilePath]'
           against inferred type `IO [FilePath]'
    In the second argument of `mapM', namely `fileNames'
    In a stmt of a 'do' expression:
        files <- (mapM getFileNameAndSize fileNames)
    In the expression:
        do { fileNames <- getDirectoryContents;
             files <- (mapM getFileNameAndSize fileNames);
             sortBy cmpFilesBySize files }

code:

getFileNameAndSize fname = do (fname,  (withFile fname ReadMode hFileSize))

getFilesWithSizes = do
  fileNames <- getDirectoryContents
  files <- (mapM getFileNameAndSize fileNames)
  sortBy cmpFilesBySize files
like image 485
Drakosha Avatar asked May 29 '10 07:05

Drakosha


People also ask

How does Haskell handle IO?

Haskell separates pure functions from computations where side effects must be considered by encoding those side effects as values of a particular type. Specifically, a value of type (IO a) is an action, which if executed would produce a value of type a .

Is IO pure Haskell?

Haskell is a pure language and even the I/O system can't break this purity. Being pure means that the result of any function call is fully determined by its arguments. Procedural entities like rand() or getchar() in C, which return different results on each call, are simply impossible to write in Haskell.

Is IO impure Haskell?

In spite of Haskell being purely functional, IO actions can be said to be impure because their impacts on the outside world are side effects (as opposed to the regular effects that are entirely contained within Haskell).

How does Haskell handle side effects?

The functional language Haskell eliminates side effects such as I/O and other stateful computations by replacing them with monadic actions. Functional languages such as Standard ML, Scheme and Scala do not restrict side effects, but it is customary for programmers to avoid them.


1 Answers

Your second, specific, problem is with the types of your functions. However, your first issue (not really a type thing) is the do statement in getFileNameAndSize. While do is used with monads, it's not a monadic panacea; it's actually implemented as some simple translation rules. The Cliff's Notes version (which isn't exactly right, thanks to some details involving error handling, but is close enough) is:

  1. do aa
  2. do a ; b ; c ...a >> do b ; c ...
  3. do x <- a ; b ; c ...a >>= \x -> do b ; c ...

In other words, getFileNameAndSize is equivalent to the version without the do block, and so you can get rid of the do. This leaves you with

getFileNameAndSize fname = (fname, withFile fname ReadMode hFileSize)

We can find the type for this: since fname is the first argument to withFile, it has type FilePath; and hFileSize returns an IO Integer, so that's the type of withFile .... Thus, we have getFileNameAndSize :: FilePath -> (FilePath, IO Integer). This may or may not be what you want; you might instead want FilePath -> IO (FilePath,Integer). To change it, you can write any of

getFileNameAndSize_do    fname = do size <- withFile fname ReadMode hFileSize
                                    return (fname, size)
getFileNameAndSize_fmap  fname = fmap ((,) fname) $
                                      withFile fname ReadMode hFileSize
-- With `import Control.Applicative ((<$>))`, which is a synonym for fmap.
getFileNameAndSize_fmap2 fname =     ((,) fname)
                                 <$> withFile fname ReadMode hFileSize
-- With {-# LANGUAGE TupleSections #-} at the top of the file
getFileNameAndSize_ts    fname = (fname,) <$> withFile fname ReadMode hFileSize

Next, as KennyTM pointed out, you have fileNames <- getDirectoryContents; since getDirectoryContents has type FilePath -> IO FilePath, you need to give it an argument. (e.g. getFilesWithSizes dir = do fileNames <- getDirectoryContents dir ...). This is probably just a simple oversight.

Mext, we come to the heart of your error: files <- (mapM getFileNameAndSize fileNames). I'm not sure why it gives you the precise error it does, but I can tell you what's wrong. Remember what we know about getFileNameAndSize. In your code, it returns a (FilePath, IO Integer). However, mapM is of type Monad m => (a -> m b) -> [a] -> m [b], and so mapM getFileNameAndSize is ill-typed. You want getFileNameAndSize :: FilePath -> IO (FilePath,Integer), like I implemented above.

Finally, we need to fix your last line. First of all, although you don't give it to us, cmpFilesBySize is presumably a function of type (FilePath, Integer) -> (FilePath, Integer) -> Ordering, comparing on the second element. This is really simple, though: using Data.Ord.comparing :: Ord a => (b -> a) -> b -> b -> Ordering, you can write this comparing snd, which has type Ord b => (a, b) -> (a, b) -> Ordering. Second, you need to return your result wrapped up in the IO monad rather than just as a plain list; the function return :: Monad m => a -> m a will do the trick.

Thus, putting this all together, you'll get

import System.IO           (FilePath, withFile, IOMode(ReadMode), hFileSize)
import System.Directory    (getDirectoryContents)
import Control.Applicative ((<$>))
import Data.List           (sortBy)
import Data.Ord            (comparing)

getFileNameAndSize :: FilePath -> IO (FilePath, Integer)
getFileNameAndSize fname = ((,) fname) <$> withFile fname ReadMode hFileSize

getFilesWithSizes :: FilePath -> IO [(FilePath,Integer)]
getFilesWithSizes dir = do fileNames <- getDirectoryContents dir
                           files     <- mapM getFileNameAndSize fileNames
                           return $ sortBy (comparing snd) files

This is all well and good, and will work fine. However, I might write it slightly differently. My version would probably look like this:

{-# LANGUAGE TupleSections #-}
import System.IO           (FilePath, withFile, IOMode(ReadMode), hFileSize)
import System.Directory    (getDirectoryContents)
import Control.Applicative ((<$>))
import Control.Monad       ((<=<))
import Data.List           (sortBy)
import Data.Ord            (comparing)

preservingF :: Functor f => (a -> f b) -> a -> f (a,b)
preservingF f x = (x,) <$> f x
-- Or liftM2 (<$>) (,), but I am not entirely sure why.

fileSize :: FilePath -> IO Integer
fileSize fname = withFile fname ReadMode hFileSize

getFilesWithSizes :: FilePath -> IO [(FilePath,Integer)]
getFilesWithSizes = return .   sortBy (comparing snd)
                           <=< mapM (preservingF fileSize)
                           <=< getDirectoryContents 

(<=< is the monadic equivalent of ., the function composition operator.) First off: yes, my version is longer. However, I'd probably already have preservingF defined somewhere, making the two equivalent in length.* (I might even inline fileSize if it weren't used elsewhere.) Second, I like this version better because it involves chaining together simpler pure functions we've already written. While your version is similar, mine (I feel) is more streamlined and makes this aspect of things clearer.

So this is a bit of an answer to your first question of how to structure these things. I personally tend to lock my IO down into as few functions as possible—only functions which need to touch the outside world directly (e.g. main and anything which interacts with a file) get an IO. Everything else is an ordinary pure function (and is only monadic if it's monadic for general reasons, along the lines of preservingF). I then arrange things so that main, etc., are just compositions and chains of pure functions: main gets some values from IO-land; then it calls pure functions to fold, spindle, and mutilate the date; then it gets more IO values; then it operates more; etc. The idea is to separate the two domains as much as possible, so that the more compositional non-IO code is always free, and the black-box IO is only done precisely where necessary.

Operators like <=< really help with writing code in this style, as they let you operate on functions which interact with monadic values (such as the IO-world) just as you would operate on normal functions. You should also look at Control.Applicative's function <$> liftedArg1 <*> liftedArg2 <*> ... notation, which lets you apply ordinary functions to any number of monadic (really Applicative) arguments. This is really nice for getting rid of spurious <-s and just chaining pure functions over monadic code.

*: I feel like preservingF, or at least its sibling preserving :: (a -> b) -> a -> (a,b), should be in a package somewhere, but I've been unable to find either.

like image 85
Antal Spector-Zabusky Avatar answered Oct 20 '22 13:10

Antal Spector-Zabusky