Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter a list of paths to only include files

If I have a list of FilePaths, how can I filter them to return only the ones that are regular files (namely, not symlinks or directories)?

For example, using getDirectoryContents

main = do
    contents <- getDirectoryContents "/foo/bar"
    let onlyFiles = filterFunction contents in
        print onlyFiles

where "filterFunction" is a function that returns only the FilePaths that represent files.

The answer may just work on Linux, but cross platform support is preferred.

[EDIT] Just using doesDirectoryExist doesn't work as expected. This script prints a list of everything in the directory, not just files:

module Main where

import System.Directory
import Control.Monad (filterM, liftM)

getFiles :: FilePath -> IO [FilePath]
getFiles root = do
    contents <- getDirectoryContents root
    filesHere <- filterM (liftM not . doesDirectoryExist) contents
    subdirs <- filterM doesDirectoryExist contents
    return filesHere

main = do
    files <- getFiles "/"
    print $ files

Additionally, the variable subdirs will only contain "." and "..".

like image 432
Langston Avatar asked Jul 15 '15 00:07

Langston


People also ask

How do I list files in a directory with full path?

Listing the full path The command DIR /b will return just a list of filenames, when displaying subfolders with DIR /b /s the command will return a full pathname. To list the full path without including subfolders, use the WHERE command.

How do you filter a file in Python?

To filter and list the files according to their names, we need to use “fnmatch. fnmatch()” and “os. listdir()” functions with name filtering regex patterns. You may find an example of filtering and listing files according to their names in Python.


2 Answers

To find standard library functions, Hoogle is a great resource; it's a Haskell search engine that lets you search by type. Using it requires figuring out how to think about types the Haskell Way™, though, which your proposed type signatures doesn't quite work with. So:

  1. You're looking for [Filepath] -> [Filepath]. Remember, the Haskell spelling is FilePath. So…

  2. You're looking for [FilePath] -> [FilePath]. This is unnecessary; if you want to filter things, you should use filter. So…

  3. You're looking for a function of type FilePath -> Bool that you can pass to filter. But this can't quite be right: this function needs to query the file system, which is an effect, and Haskell tracks effects in the type system using IO. So…

  4. You're looking for a function of type FilePath -> IO Bool.

And if we search for that on Hoogle, the first result is doesFileExist :: FilePath -> IO Bool from System.Directory. From the docs:

The operation doesFileExist returns True if the argument file exists and is not a directory, and False otherwise.

So System.Directory.doesFileExist is exactly what you want. (Well… only with a little extra work! See below.)

Now, how do you use it? You can't use filter here, because you have an effectful function. You could use Hoogle again – if filter has the type (a -> Bool) -> [a] -> [a], then annotating the results of the functions with a monad m gives you the new type Monad m => (a -> m Bool) -> [a] -> m [Bool] – but there's an easier "cheap trick". In general, if func is a function with an effectful/monadic version, that effectful/monadic version is called funcM, and it often lives in Control.Monad.¹ And indeed, there is a function Control.Monad.filterM :: Monad m => (a -> m Bool) -> [a] -> m [a].

However! Much as we hate to admit it, even in Haskell, types don't provide all the information you need. Importantly, we're going to have a problem here:

  • File paths given as arguments to functions are interpreted relative to the current directory, but…
  • getDirectoryContents returns paths relative to its argument.

Thus, there are two approaches we can take to fix things. The first is to adjust the results of getDirectoryContents so that they can be interpreted correctly. (We also discarding the . and .. results, although if you're just looking for regular files they won't hurt anything.) This will return file names which include the directory whose contents are being examined. The adjust getDirectoryContents function looks like this:

getQualifiedDirectoryContents :: FilePath -> IO [FilePath]
getQualifiedDirectoryContents fp =
    map (fp </>) . filter (`notElem` [".",".."]) <$> getDirectoryContents fp

The filter gets rid of the special directories, and the map prepends the argument directory to all the results. This makes the returned files acceptable arguments to doesFileExist. (If you haven't seen them before, (System.FilePath.</>) appends two file paths; and (Control.Applicative.<$>), also available as (Data.Functor.<$>), is an infix synonym for fmap, which is like liftM but more broadly applicable.)

Putting that all together, your final code becomes:

import Control.Applicative
import Control.Monad
import System.FilePath
import System.Directory

getQualifiedDirectoryContents :: FilePath -> IO [FilePath]
getQualifiedDirectoryContents fp =
    map (fp </>) . filter (`notElem` [".",".."]) <$> getDirectoryContents fp

main :: IO ()
main = do
  contents  <- getQualifiedDirectoryContents "/foo/bar"
  onlyFiles <- filterM doesFileExist contents
  print onlyFiles

Or, if you feel like being fancy/point-free:

import Control.Applicative
import Control.Monad
import System.FilePath
import System.Directory

getQualifiedDirectoryContents :: FilePath -> IO [FilePath]
getQualifiedDirectoryContents fp =
    map (fp </>) . filter (`notElem` [".",".."]) <$> getDirectoryContents fp

main :: IO ()
main =   print
     =<< filterM doesFileExist
     =<< getQualifiedDirectoryContents "/foo/bar"

The second approach is to adjust things so that doesFileExist runs with the appropriate current directory. This will return just the file name relative to the directory whose contents are being examined. To do this, we want to use the withCurrentDirectory :: FilePath -> IO a -> IO a function (but see below), and then pass getDirectoryContents the current directory "." argument. The documentation for withCurrentDirectory says (in part):

Run an IO action with the given working directory and restore the original working directory afterwards, even if the given action fails due to an exception.

Putting all this together gives us the following code

import Control.Monad
import System.Directory

main :: IO ()
main = withCurrentDirectory "/foo/bar" $
         print =<< filterM doesFileExist =<< getDirectoryContents "."

This is what we want, but unfortunately, it's only available in version 1.3.2.0 of the directory package – as of this writing, the most recent one, and not the one I have. Luckily, it's an easy function to implement; such set-a-value-locally functions are usually implemented in terms of Control.Exception.bracket :: IO a -> (a -> IO b) -> (a -> IO c) -> IO c. The bracket function is run as bracket before after action, and it correctly handles exceptions. So we can define withCurrentDirectory ourselves:

withCurrentDirectory :: FilePath -> IO a -> IO a
withCurrentDirectory fp m =
  bracket getCurrentDirectory setCurrentDirectory $ \_ -> do
    setCurrentDirectory fp
    m

And then use this to get the final code:

import Control.Exception
import Control.Monad
import System.Directory

withCurrentDirectory :: FilePath -> IO a -> IO a
withCurrentDirectory fp m =
  bracket getCurrentDirectory setCurrentDirectory $ \_ -> do
    setCurrentDirectory fp
    m

main :: IO ()
main = withCurrentDirectory "/foo/bar" $
         print =<< filterM doesFileExist =<< getDirectoryContents "."

Also, one quick note about lets in dos: in a do block,

do ...foo...
   let x = ...bar...
   ...baz...

is equivalent to

do ...foo...
   let x = ...bar... in
     do ...baz...

So your example code doesn't need the in in the let and can outdent the print call.


¹ Not always: sometimes you want different classes of effects! Use Applicative from Control.Applicative when possible; more things are Applicatives than are Monads (although this means you can do less with them). In that case, the effectful functions may live there, or also in Data.Foldable or Data.Traversable.

like image 181
Antal Spector-Zabusky Avatar answered Oct 17 '22 02:10

Antal Spector-Zabusky


I happened to need a way to list only regular files in a directory, and this is how I do it. I thought it might be helpful:

import System.Directory

listFilesInDirectory :: FilePath -> IO [FilePath]
listFilesInDirectory dir = do
    rawList <- listDirectory dir
    filterM doesFileExist (map (dir </>) rawList)
like image 36
Dogweather Avatar answered Oct 17 '22 02:10

Dogweather