If I have a list of FilePaths
, how can I filter them to return only the ones that are regular files (namely, not symlinks or directories)?
For example, using getDirectoryContents
main = do
contents <- getDirectoryContents "/foo/bar"
let onlyFiles = filterFunction contents in
print onlyFiles
where "filterFunction" is a function that returns only the FilePaths
that represent files.
The answer may just work on Linux, but cross platform support is preferred.
[EDIT] Just using doesDirectoryExist doesn't work as expected. This script prints a list of everything in the directory, not just files:
module Main where
import System.Directory
import Control.Monad (filterM, liftM)
getFiles :: FilePath -> IO [FilePath]
getFiles root = do
contents <- getDirectoryContents root
filesHere <- filterM (liftM not . doesDirectoryExist) contents
subdirs <- filterM doesDirectoryExist contents
return filesHere
main = do
files <- getFiles "/"
print $ files
Additionally, the variable subdirs will only contain "."
and ".."
.
Listing the full path The command DIR /b will return just a list of filenames, when displaying subfolders with DIR /b /s the command will return a full pathname. To list the full path without including subfolders, use the WHERE command.
To filter and list the files according to their names, we need to use “fnmatch. fnmatch()” and “os. listdir()” functions with name filtering regex patterns. You may find an example of filtering and listing files according to their names in Python.
To find standard library functions, Hoogle is a great resource; it's a Haskell search engine that lets you search by type. Using it requires figuring out how to think about types the Haskell Way™, though, which your proposed type signatures doesn't quite work with. So:
You're looking for [Filepath] -> [Filepath]
. Remember, the Haskell spelling is FilePath
. So…
You're looking for [FilePath] -> [FilePath]
. This is unnecessary; if you want to filter things, you should use filter
. So…
You're looking for a function of type FilePath -> Bool
that you can pass to filter
. But this can't quite be right: this function needs to query the file system, which is an effect, and Haskell tracks effects in the type system using IO
. So…
You're looking for a function of type FilePath -> IO Bool
.
And if we search for that on Hoogle, the first result is doesFileExist :: FilePath -> IO Bool
from System.Directory
. From the docs:
The operation
doesFileExist
returnsTrue
if the argument file exists and is not a directory, andFalse
otherwise.
So System.Directory.doesFileExist
is exactly what you want. (Well… only with a little extra work! See below.)
Now, how do you use it? You can't use filter
here, because you have an effectful function. You could use Hoogle again – if filter
has the type (a -> Bool) -> [a] -> [a]
, then annotating the results of the functions with a monad m
gives you the new type Monad m => (a -> m Bool) -> [a] -> m [Bool]
– but there's an easier "cheap trick". In general, if func
is a function with an effectful/monadic version, that effectful/monadic version is called funcM
, and it often lives in Control.Monad
.¹ And indeed, there is a function Control.Monad.filterM :: Monad m => (a -> m Bool) -> [a] -> m [a]
.
However! Much as we hate to admit it, even in Haskell, types don't provide all the information you need. Importantly, we're going to have a problem here:
getDirectoryContents
returns paths relative to its argument.Thus, there are two approaches we can take to fix things. The first is to adjust the results of getDirectoryContents
so that they can be interpreted correctly. (We also discarding the .
and ..
results, although if you're just looking for regular files they won't hurt anything.) This will return file names which include the directory whose contents are being examined. The adjust getDirectoryContents
function looks like this:
getQualifiedDirectoryContents :: FilePath -> IO [FilePath]
getQualifiedDirectoryContents fp =
map (fp </>) . filter (`notElem` [".",".."]) <$> getDirectoryContents fp
The filter
gets rid of the special directories, and the map
prepends the argument directory to all the results. This makes the returned files acceptable arguments to doesFileExist
. (If you haven't seen them before, (System.FilePath.</>)
appends two file paths; and (Control.Applicative.<$>)
, also available as (Data.Functor.<$>)
, is an infix synonym for fmap
, which is like liftM
but more broadly applicable.)
Putting that all together, your final code becomes:
import Control.Applicative
import Control.Monad
import System.FilePath
import System.Directory
getQualifiedDirectoryContents :: FilePath -> IO [FilePath]
getQualifiedDirectoryContents fp =
map (fp </>) . filter (`notElem` [".",".."]) <$> getDirectoryContents fp
main :: IO ()
main = do
contents <- getQualifiedDirectoryContents "/foo/bar"
onlyFiles <- filterM doesFileExist contents
print onlyFiles
Or, if you feel like being fancy/point-free:
import Control.Applicative
import Control.Monad
import System.FilePath
import System.Directory
getQualifiedDirectoryContents :: FilePath -> IO [FilePath]
getQualifiedDirectoryContents fp =
map (fp </>) . filter (`notElem` [".",".."]) <$> getDirectoryContents fp
main :: IO ()
main = print
=<< filterM doesFileExist
=<< getQualifiedDirectoryContents "/foo/bar"
The second approach is to adjust things so that doesFileExist
runs with the appropriate current directory. This will return just the file name relative to the directory whose contents are being examined. To do this, we want to use the withCurrentDirectory :: FilePath -> IO a -> IO a
function (but see below), and then pass getDirectoryContents
the current directory "."
argument. The documentation for withCurrentDirectory
says (in part):
Run an
IO
action with the given working directory and restore the original working directory afterwards, even if the given action fails due to an exception.
Putting all this together gives us the following code
import Control.Monad
import System.Directory
main :: IO ()
main = withCurrentDirectory "/foo/bar" $
print =<< filterM doesFileExist =<< getDirectoryContents "."
This is what we want, but unfortunately, it's only available in version 1.3.2.0 of the directory
package – as of this writing, the most recent one, and not the one I have. Luckily, it's an easy function to implement; such set-a-value-locally functions are usually implemented in terms of Control.Exception.bracket :: IO a -> (a -> IO b) -> (a -> IO c) -> IO c
. The bracket
function is run as bracket before after action
, and it correctly handles exceptions. So we can define withCurrentDirectory
ourselves:
withCurrentDirectory :: FilePath -> IO a -> IO a
withCurrentDirectory fp m =
bracket getCurrentDirectory setCurrentDirectory $ \_ -> do
setCurrentDirectory fp
m
And then use this to get the final code:
import Control.Exception
import Control.Monad
import System.Directory
withCurrentDirectory :: FilePath -> IO a -> IO a
withCurrentDirectory fp m =
bracket getCurrentDirectory setCurrentDirectory $ \_ -> do
setCurrentDirectory fp
m
main :: IO ()
main = withCurrentDirectory "/foo/bar" $
print =<< filterM doesFileExist =<< getDirectoryContents "."
Also, one quick note about let
s in do
s: in a do
block,
do ...foo...
let x = ...bar...
...baz...
is equivalent to
do ...foo...
let x = ...bar... in
do ...baz...
So your example code doesn't need the in
in the let
and can outdent the print
call.
¹ Not always: sometimes you want different classes of effects! Use Applicative
from Control.Applicative
when possible; more things are Applicative
s than are Monad
s (although this means you can do less with them). In that case, the effectful functions may live there, or also in Data.Foldable
or Data.Traversable
.
I happened to need a way to list only regular files in a directory, and this is how I do it. I thought it might be helpful:
import System.Directory
listFilesInDirectory :: FilePath -> IO [FilePath]
listFilesInDirectory dir = do
rawList <- listDirectory dir
filterM doesFileExist (map (dir </>) rawList)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With