Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tree traversal algorithm for directory structures with a lot of files

When recursively traversing through a directory structure, what is the most efficient algorithm to use if you have more files than directories? I notice that when using depth-first traversal, it seems to take longer when there are a lot of files in a given directory. Does breadth-first traversal work more efficiently in this case? I have no way to profile the two algorithms at the moment so your insights are very much welcome.

EDIT: In response to alphazero's comment, I'm using PHP on a Linux machine.

like image 437
oninea Avatar asked Dec 17 '09 12:12

oninea


People also ask

Which data structure is best for file directory?

Answer. Answer: A tree structure is the most common directory structure. B+ tree is best for keeping track of directories and files in a computer.

What are the most common scheme for defining the logical structure of a directory?

Tree-structured directory – A tree structure is the most common directory structure. The tree has a root directory, and every file in the system has a unique path.

How can you explain the structure of a directory using tree hierarchy?

Tree-structured directoryThe directory is structured in the form of a tree. It also has a root directory, and every file in the system has a unique path. A directory within a tree-structured directory may contain files or subdirectories. Special system calls are used to create or remove directories.


1 Answers

Since you have more files than directories, it does not appear as if you are dealing with very deeply nested directories that would make DFS to take more memory (and hence somewhat more time) than BFS. Essentially, BFS and DFS both do the same thing (i.e. visit every node of the graph), and so in general their speeds should not differ by any significant amount.

It is difficult to say why exactly your DFS is slower without actually seeing your implementation. Are you sure you are not visiting the same nodes more than once due to links/shortcuts in your filesystem? You will also probably get a significant speedup if you use an explicit stack based DFS rather than recursion.

like image 180
MAK Avatar answered Oct 08 '22 09:10

MAK