I'm trying to traverse directory tree using RxJs and node.js.
I came up with working solution:
const filesInDir = Rx.Observable.fromNodeCallback(fs.readdir)
const statFile = Rx.Observable.fromNodeCallback(fs.stat)
const listFiles = (prefix, dir = '') => {
const file$ = filesInDir(`${prefix}/${dir}`)
.flatMap(file => file)
.filter(file => !file.startsWith('.'))
const isDir$ = file$
.map(file => statFile(`${prefix}/${dir}/${file}`))
.flatMap(file => file)
.map(file => file.isDirectory())
return file$
.zip(isDir$, (file, isDir) => {return {file, isDir}})
.map(f => {
if (f.isDir) {
return listFiles(prefix, `${dir}/${f.file}`)
}
return Rx.Observable.return(`${dir}/${f.file}`)
})
.flatMap(file => file)
}
listFiles('public')
.toArray()
.subscribe(list => {
console.log(list)
})
Questions:
.map
using asynchronous operation?.zip
partGreat question.
I think you can do a few things to optimise this query.
Firstly we can change the map
opeartors followed by .flatMap(file => file)
to just a single flatMap. Tiny improvement, but will run less code.
const file$ = filesInDir(`${prefix}/${dir}`)
.flatMap(file => file)
.filter(file => !file.startsWith('.'))
const isDir$ = file$
.flatMap(file => statFile(`${prefix}/${dir}/${file}`))
.map(file => file.isDirectory())
return file$
.zip(isDir$, (file, isDir) => {return {file, isDir}})
.flatMap(f => {
if (f.isDir) {
return listFiles(prefix, `${dir}/${f.file}`)
}
return Rx.Observable.return(`${dir}/${f.file}`)
})
The main improvement is where I believe you are actually hitting the File System twice.
The filesInDir
observable sequence is not a hot/cached sequence.
If it was, the recursive walking of the directory tree wouldn't work.
With that in mind, you are calling it once to get all the files, and then you are calling it again to do the isDirectory
check.
This introduces both a potential performance cost, and a bug.
You are making the assumption that when you hit the disk that the sequence of files that are returned will always be in the same order.
Even if we ignore for a second, that disk is mutable and it could change under you.
You can be guaranteed that in an async world that the sequences will be returned in the same order.
On my machine (Windows 10) the sequence is mostly returned in the same order.
However with a deep enough tree (e.g. from _C:_) I hit a mismatch every-time.
Anyway, the performance fix is also the bug fix.
Instead of re-reading from the file system every time, we can do it once.
Moving the statFile()
call into a flatMap
that also maps the result with the closure of the file passed to statFile
const listFiles = (prefix, dir) => {
return file$ = filesInDir(`${prefix}/${dir}`)
.flatMap(file => file)
.filter(file => !file.startsWith('.'))
.flatMap(file => statFile(`${prefix}/${dir}/${file}`)
.map( sf => {return {file, isDir: sf.isDirectory()}}) )
.flatMap(f => {
if (f.isDir) {
return listFiles(prefix, `${dir}/${f.file}`)
}
return Rx.Observable.return(`${dir}/${f.file}`)
})
}
This also has the benefit of removing the Zip
clause because we are no longer trying to work with two sequences.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With