I have millions of audio files, generated based on GUId (http://en.wikipedia.org/wiki/Globally_Unique_Identifier). How can I store these files in the file-system so that I can efficiently add more files in the same file-system and can search for a particular file efficiently. Also it should be scalable in future.
Files are named based on GUId (unique file name).
Eg:
[1] 63f4c070-0ab2-102d-adcb-0015f22e2e5c
[2] ba7cd610-f268-102c-b5ac-0013d4a7a2d6
[3] d03cf036-0ab2-102d-adcb-0015f22e2e5c
[4] d3655a36-0ab3-102d-adcb-0015f22e2e5c
Pl. give your views.
PS: I have already gone through < Storing a large number of images >. I need the particular data-structure/algorithm/logic so that it can also be scalable in future.
EDIT1: Files are around 1-2 millions in number and file system is ext3 (CentOS).
Thanks,
Naveen
That's very easy - build a folder tree based on GUID values parts.
For example, make 256 folders each named after the first byte and only store there files that have a GUID starting with this byte. If that's still too many files in one folder - do the same in each folder for the second byte of the GUID. Add more levels if needed. Search for a file will be very fast.
By selecting the number of bytes you use for each level you can effectively choose the tree structure for your scenario.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With