Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Traverse directory tree in node.js using RxJs

I'm trying to traverse directory tree using RxJs and node.js.

I came up with working solution:

const filesInDir = Rx.Observable.fromNodeCallback(fs.readdir)
const statFile = Rx.Observable.fromNodeCallback(fs.stat)

const listFiles = (prefix, dir = '') => {
    const file$ = filesInDir(`${prefix}/${dir}`)
        .flatMap(file => file)
        .filter(file => !file.startsWith('.'))
    const isDir$ = file$
        .map(file => statFile(`${prefix}/${dir}/${file}`))
        .flatMap(file => file)
        .map(file => file.isDirectory())
    return file$
        .zip(isDir$, (file, isDir) => {return {file, isDir}})
        .map(f => {
            if (f.isDir) {
                return listFiles(prefix, `${dir}/${f.file}`)
            }
            return Rx.Observable.return(`${dir}/${f.file}`)
        })
        .flatMap(file => file)
}

listFiles('public')
    .toArray()
    .subscribe(list => {
        console.log(list)
    })

Questions:

  1. Is there more efficient/concise way to .map using asynchronous operation?
  2. Same question for the .zip part
like image 931
Jakub Fedyczak Avatar asked Feb 08 '23 08:02

Jakub Fedyczak


1 Answers

Great question.

I think you can do a few things to optimise this query.

Firstly we can change the map opeartors followed by .flatMap(file => file) to just a single flatMap. Tiny improvement, but will run less code.

const file$ = filesInDir(`${prefix}/${dir}`)
    .flatMap(file => file)
    .filter(file => !file.startsWith('.'))
const isDir$ = file$
    .flatMap(file => statFile(`${prefix}/${dir}/${file}`))
    .map(file => file.isDirectory())
return file$
    .zip(isDir$, (file, isDir) => {return {file, isDir}})
    .flatMap(f => {
        if (f.isDir) {
            return listFiles(prefix, `${dir}/${f.file}`)
        }
        return Rx.Observable.return(`${dir}/${f.file}`)
    })

The main improvement is where I believe you are actually hitting the File System twice. The filesInDir observable sequence is not a hot/cached sequence. If it was, the recursive walking of the directory tree wouldn't work. With that in mind, you are calling it once to get all the files, and then you are calling it again to do the isDirectory check. This introduces both a potential performance cost, and a bug. You are making the assumption that when you hit the disk that the sequence of files that are returned will always be in the same order. Even if we ignore for a second, that disk is mutable and it could change under you. You can be guaranteed that in an async world that the sequences will be returned in the same order. On my machine (Windows 10) the sequence is mostly returned in the same order. However with a deep enough tree (e.g. from _C:_) I hit a mismatch every-time.

Anyway, the performance fix is also the bug fix. Instead of re-reading from the file system every time, we can do it once. Moving the statFile() call into a flatMap that also maps the result with the closure of the file passed to statFile

const listFiles = (prefix, dir) => {
    return file$ = filesInDir(`${prefix}/${dir}`)
        .flatMap(file => file)
        .filter(file => !file.startsWith('.'))
        .flatMap(file => statFile(`${prefix}/${dir}/${file}`)
                    .map( sf => {return {file, isDir: sf.isDirectory()}}) )
        .flatMap(f => {
            if (f.isDir) {
                return listFiles(prefix, `${dir}/${f.file}`)
            }
            return Rx.Observable.return(`${dir}/${f.file}`)
        })
}

This also has the benefit of removing the Zip clause because we are no longer trying to work with two sequences.

like image 71
Lee Campbell Avatar answered Feb 10 '23 22:02

Lee Campbell