Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to list directories faster?

I have a few situations where I need to list files recursively, but my implementations have been slow. I have a directory structure with 92784 files. find lists the files in less than 0.5 seconds, but my Haskell implementation is a lot slower.

My first implementation took a bit over 9 seconds to complete, next version a bit over 5 seconds and I'm currently down to a bit less than two seconds.

listFilesR :: FilePath -> IO [FilePath]
listFilesR path = let
    isDODD "." = False
    isDODD ".." = False
    isDODD _ = True

    in do
        allfiles <- getDirectoryContents path
    dirs <- forM allfiles $ \d ->
      if isDODD d then
        do let p = path </> d
           isDir <- doesDirectoryExist p
           if isDir then listFilesR p else return [d]
        else return []
    return $ concat dirs

The test takes about 100 megabytes of memory (+RTS -s), and the program spends around 40% in GC.

I was thinking of doing the listing in a WriterT monad with Sequence as the monoid to prevent the concats and list creation. Is it likely this helps? What else should I do?

Edit: I have edited the function to use readDirStream, and it helps keeping the memory down. There's still some allocation happening, but productivity rate is >95% now and it runs in less than a second.

This is the current version:

list path = do
  de <- openDirStream path
  readDirStream de >>= go de
  closeDirStream de
  where
    go d [] = return ()
    go d "." = readDirStream d >>= go d
    go d ".." = readDirStream d >>= go d
    go d x = let newpath = path </> x
         in do
          e <- doesDirectoryExist newpath
          if e 
        then
          list newpath >> readDirStream d >>= go d
        else putStrLn newpath >> readDirStream d >>= go d 
like image 703
Masse Avatar asked Oct 07 '10 12:10

Masse


People also ask

Is rsync faster than rm?

rsync in this benchmark case is faster than rm -rf : web.archive.org/web/20130929001850/http://linuxnote.net/… Great explanation. Magma is liquid hot by definition. It's still a great example of a better file destruction method.

How can I make my CMD faster?

The best way to speed up find is indeed by using xargs in place of -exec , but also including the -P option in your command, which will instruct xargs to use multiple CPU cores.


1 Answers

I think that System.Directory.getDirectoryContents constructs a whole list and therefore uses much memory. How about using System.Posix.Directory? System.Posix.Directory.readDirStream returns an entry one by one.

Also, FileManip library might be useful although I have never used it.

like image 85
Tsuyoshi Ito Avatar answered Sep 28 '22 03:09

Tsuyoshi Ito