Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate valid, deterministic UUIDs for tests

For my ruby testsuite, I need predictable UUIDs. I am aware that UUIDs are by nature random and non-deterministic, and that this is good. But in the testsuite, it would be useful to have UUIDs that can be re-used through fixtures, data-helpers, seeds etc.

I now have a naive implementation that easily leads to invalid UUIDs:

def fake_uuid(character = "x")
  [8, 4, 4, 4, 12].map { |length| character * length }.join("-")
end

fake_uuid('a') => "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa" # This is valid
fake_uuid('z') => "zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz" # This is invalid, not hex.

I could, obviously add checks that only a-f,0-9 are allowed as input. An alternative would be to harcode a pre-generated list of UUIDs and pick one based on arguments.

But I'm wondering, is there not a better way? Would UUIDv5 work for this? Is there a way to call SecureRandom.uuid to have it return the same UUID (for a thread or session)? Does it need an additional gem? Or is my approach the closest one can get?

Having it made up of all the same characters is not a requirement.
Having it somewhat readable is a big pro, but not a requirement. This way, you can e.g. ensure that a Company has a UUID cccccccc-cccc-cccc-cccc-cccccccccccc and its Employee the UUID eeeeeeee-eeee-eeee-eeee-eeeeeeeeeeee.

like image 847
berkes Avatar asked May 18 '26 11:05

berkes


2 Answers

I am aware that UUIDs are by nature random and non-deterministic, and that this is good.

That assumption is wrong.

There are 5 versions of UUID:

  • Versions 1 and 2 are based on MAC address and date time, and thus deterministic in the sense that it would theoretically give the same UUID on the same computer at the same time.
  • Versions 3 and 5 are based on namespace and name, and thus fully deterministic.
  • Version 4 is random.

So, if you use Version 3 or Version 5 UUIDs, they will be fully deterministic.

like image 77
Jörg W Mittag Avatar answered May 21 '26 04:05

Jörg W Mittag


UUIDs use two digits to denote their format: (actually just some of the digit's bits)

xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx
              ^    ^
        version    variant

The following pattern denotes version 4 (M=4), variant 1 (N=8) which simply means "random bytes":

xxxxxxxx-xxxx-4xxx-8xxx-xxxxxxxxxxxx

You could use it as a template to generate fake (but valid) UUIDs based on a sequence number: (as suggested in the comments)

def fake_uuid(n)
  '00000000-0000-4000-8000-%012x' % n
end

fake_uuid(1) #=> "00000000-0000-4000-8000-000000000001"
fake_uuid(2) #=> "00000000-0000-4000-8000-000000000002"
fake_uuid(3) #=> "00000000-0000-4000-8000-000000000003"

Having it somewhat readable is a big pro ...

There are plenty of unused fields / digits to add more data:

def fake_uuid(klass, n)
  k = { Company => 1, Employee => 2 }.fetch(klass, 0)

  '%08x-0000-4000-8000-%012x' % [k, n]
end

fake_uuid(Company, 1)   #=> "00000001-0000-4000-8000-000000000001"
fake_uuid(Company, 2)   #=> "00000001-0000-4000-8000-000000000002"

fake_uuid(Employee, 1)  #=> "00000002-0000-4000-8000-000000000001"
fake_uuid(Employee, 2)  #=> "00000002-0000-4000-8000-000000000002"

#                            ^^^^^^^^                ^^^^^^^^^^^^
#                              class                   sequence
like image 36
Stefan Avatar answered May 21 '26 02:05

Stefan