What Data Structure is best to use for file organization? Are B-Trees the best or is there another data structure which obtains faster access to files and good organization? Thanks
File Structures: Pile, Sequential, Indexed Sequential, Direct access, Inverted files; Indexing structures- B-tree and its variations.
Answer. Answer: A tree structure is the most common directory structure. B+ tree is best for keeping track of directories and files in a computer.
All file systems are different, so there are a huge number of data structures that actually get used in file systems.
The systems that are used to organize and maintain data files are known as file based data systems. These file systems are used to handle a single or multiple files and are not very efficient. The functionalities of a File-based Data Management System are as follows − A file based system helps in basic data management for any user.
Files could be arranged or more complex structures to reflect the relationship between them. A File Structure needs to be predefined format in such a way that an operating system understands . It has an exclusively defined structure, which is based on its type. Three types of files structure in OS:
We can implement file system by using two types data structures : 1. On-disk Structures – It is usually the first block of volume and it contains information needed to boot an operating system.In UNIX it is called boot block and in NTFS it is called as partition boot sector.
All file systems are different, so there are a huge number of data structures that actually get used in file systems.
Many file systems use some sort of bit vector (usually referred to as a bitmap) to track where certain free blocks are, since they have excellent performance for querying whether a specific block of disk is in use and (for disks that aren't overwhelmingly full) support reasonably fast lookups of free blocks.
Many older file systems (ext and ext2) stored directory structures using simple linked lists. Apparently this was actually fast enough for most applications, though some types of applications that used lots of large directories suffered noticeable performance hits.
The XFS file system was famous for using B+-trees for just about everything, including directory structures and its journaling system. From what I remember from my undergrad OS course, the philosophy was that since it took so long to write, debug, and performance tune the implementation of the B+-tree, it made sense to use it as much as possible.
Other file systems (ext3 and ext4) use a variant of the B-tree called the HTree that I'm not very familiar with. Apparently it uses some sort of hashing scheme to keep the branching factor high so that very few disk accesses are used.
I have heard anecdotally that some operating systems tried using splay trees to store their directory structures but ran into trouble with them. Specifically, it prevented multithreaded access to the same directory from multiple readers (since in a splay tree, each access reshapes the tree) and encountered an edge case where the tree would degenerate to a linked list if all elements of the tree were accesses sequentially. That said, I don't know if this is just an urban legend, since these problems would have been apparent before anyone tried to code them up.
Microsoft's FAT32 system used a huge array (the file allocation table) that store what files were stored where and which disk sectors follow one another logically in a file. The main drawback is that the table had to be set up in advance, so there ended up being upper limits on the sizes of files that could be stored on the disk. However, the array-based system was pretty easy to implement.
This is not an exhaustive list - I'm sure that other file systems use other data structures. However, I hope it helps give you a push in the right direction.
Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With