Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Evenly spread files in directories using UUID splits

Tags:

uuid

hash

I am investigating the best way to build an evenly distributed two level directory structure for storing files received by a distributed application (files come in with no file name). To achieve this, my initial plan was to take the first two chars of a hash string for directory 1 and the next two for directory 2 e.g.:

A hash of 67cabf2cf7418461ad53d9fd7e067049 can be used to store a file by the same name in a directory structure as follows: /67/ca/ 67cabf2cf7418461ad53d9fd7e067049

I then realised that as I have to create a unique file name (UUID) for each incoming 'blob' perhaps I could just use the first four characters of the UUID itself saving myself the need to hash the UUID. e.g.:

A UUID of ea5dc4cf-1b91-4a8f-8d56-69b7223d8954 can be used to store a file by the same name in a directory structure as follows: /ea/5d/ ea5dc4cf-1b91-4a8f-8d56-69b7223d8954

I have a good understanding of the uniqueness of UUIDs but I can't find any decisive explanation as to whether the randomness of the first four characters will be as evenly spread as the first four characters of a hash especially given that the first octet is rooted in a timestamp (source https://www.rfc-editor.org/rfc/rfc4122).

I did find this previous question but it seems far from decisively answered!

Is anyone able to help me understand this better or explain why the spread may or may not be as even as a hash?

like image 565
bunoi Avatar asked Jan 31 '23 02:01

bunoi


1 Answers

So, I ran a test with a set of 10,000,000 UUIDs (version 4) and I conclude that the spread is very even. The results are as follows, I hope it helps someone.

+---------------+------------+--+----------------+------------+
| First 2 chars | frequency  |  | Second 2 chars | frequency  |
+---------------+------------+--+----------------+------------+
| ea            | 39781      |  | 3c             | 39624      |
+---------------+------------+--+----------------+------------+
| 57            | 39589      |  | 6e             | 39575      |
+---------------+------------+--+----------------+------------+
| 63            | 39566      |  | f6             | 39524      |
+---------------+------------+--+----------------+------------+
| etc.          | etc.       |  | etc.           | etc.       |
+---------------+------------+--+----------------+------------+
| middle rows of results removed to keep this concise.        |
+---------------+------------+--+----------------+------------+
+---------------+------------+--+----------------+------------+
| b3            | 38455      |  | cf             | 38572      |
+---------------+------------+--+----------------+------------+
| f8            | 38454      |  | 4a             | 38549      |
+---------------+------------+--+----------------+------------+
| d7            | 38448      |  | b1             | 38540      |
+---------------+------------+--+----------------+------------+
| Total         | 10,000,000 |  |                | 10,000,000 |
+---------------+------------+--+----------------+------------+
like image 89
bunoi Avatar answered Feb 05 '23 17:02

bunoi