Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Windows File system API to query large files

I have HDD (say 1TB) with FAT32 and NTFS partitions and I dont have information on which all files are stored on it, but when needed I want to quickly access large files say over 500 MB. I dont want to scan my whole HDD since it is very time consuming. I need quick results. I was wondering if there are any NTFS/FAT32 APIs that I can directly call - i mean if they have some metadata about the files that are stored then it will be quicker. I want to write my program in C++ and C#.

EDIT If scanning the HDD is the only option then what all can I do to ensure best performance. Like - I could skip scanning system folders, since I am only interested in user data.

like image 763
Nitin Chaudhari Avatar asked Dec 28 '22 21:12

Nitin Chaudhari


1 Answers

If you're willing to do a lot of extra work yourself to speed things up, you might be able to accomplish something. A lot is going to depend on what you need.

Let's start with FAT32. FAT (in general, not just the 32-bit variant) is named for the File Allocation Table. This is a block of data toward the beginning of the partition that tells which clusters in the partition belong to which files. The FAT is basically organized as linked lists of clusters. If you just want to find the data areas for the large files, you can read the FAT in as a number of raw sectors, and scan through that data to find linked lists of more than X clusters (where X defines the lower limit for what you consider a large file). You can then access those clusters and see the actual data associated with each file. Oddly, what you won't know is the name of that file. The file names are contained in directories, which are basically like files, except that what they contain are a fixed-size records of a specified format. You have to start from the root directory, and read through the directory tree to find file names.

NTFS is both simpler and more complex. NTFS has a Master File Table (MFT) to contains records for all the files in a partition. The good point is that you can read the MFT and get information about every file on the disk without chasing through the directory tree to get it. The bad point is that decoding the contents of an NTFS partition is definitely non-trivial. Reading data (meaningfully) is quite difficult -- and writing data much more difficult. Also, recent versions of Windows have added more restrictions on raw reading from disk partitions, so depending on what partition you're after, you may not be able to access the data you need at all.

None of this, however, is anything that's more than minimally supported. To do it, you open a file named "\.\D:" (where D=letter of the disk you care about). You can then read raw sectors from that disk drive (assuming that opening it worked). This will let you see the raw data for the entire disk (or partition, as the case may be) starting from the boot sector, and going through everything else that's there (FAT, root directory, subdirectories, etc. -- all as sectors of raw data). The system will let you read the raw data, but all the work to make any sense of that data is 100% your responsibility. If the speed you've asked about is an absolute necessity, this may be a possibility -- but it'll take a fair amount of work for FAT volumes, and considerably more than that for NTFS. Unless you really need extra speed like you've said, it's probably not even worth considering trying to do this.

like image 168
Jerry Coffin Avatar answered Dec 31 '22 12:12

Jerry Coffin