Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generate unique ID in Java, to label groups of related entries in a log

There are several posts on SO on this topic. Each of those talk about a specific approach so wanted to just get a comparison in one question.

Using new Date() as unique identifier

Generating a globally unique identifier in Java

I am trying to implement a feature where we are able to identify certain events in the log file. These events need to be associated with a unique id. I am trying to come up with a strategy for this unique ID generation. The ID has to have 2 parts : some static information + some dynamic information The logs can be searched for the pattern when debugging of events is needed. I have three ways :

  • static info + Joda Date time("abc"+2014-01-30T12:36:12.703)
  • static info + Atomic Integer
  • static info + UUID

For the scope of this question, multiple JVMs is not a consideration. I need to generate unique IDs in an efficient manner on one JVM. Also, I will not be able to use a database dependent solution.

Which of the 3 above mentioned strategies works best ?

  • If not one from the above, any other strategy ?
  • Is the Joda time based strategy robust ? The JVM is single but there will be concurrent users so there can be concurrent events.
  • In conjunction with one of the above/other strategies, Do I need to make my method thread-safe / synchronized ?
like image 920
souser Avatar asked Feb 03 '14 20:02

souser


People also ask

How do I make a global unique identifier?

Users do not need to rely on a centralized authority to administer GUIDs, as anyone can use a generation algorithm to create a GUID. Individuals and organizations can create GUIDs using a free GUID generator that is available online. An online generator constructs a unique GUID according to RFC 4122.


2 Answers

I have had the same need as you, distinguishing a thread of related entries interleaved with other unrelated entries in a log. I have tried all three of your suggested approaches. My experience was in 4D not Java, but similar.

Date-Time

In my case, I was using a date-time value resolved to whole seconds. That is simply too large a granularity. I easily had collisions where multiple events started within the same second. Damn those speedy computers!

In your case with either the bundled java.util.Date or Joda-Time (highly recommended for other purposes), both resolve to milliseconds. A millisecond is a long time in modern computers, so I don't recommend this.

In Java 8, the new java.time.* package (inspired by Joda-Time, defined by JSR 310) resolve to nanoseconds. This might seem to be a better identifier, but no. For one thing, your computer's physical time-keeping clock may not support such a fine resolution. Another is that computers keep getting faster. Lastly, a computer's clock can be reset, indeed it is reset often as computer clocks drift quite a bit. Modern OSes reset their clocks by frequently checking with a time server either locally or over the Internets.

Also, logs already have a timestamp, so we are not getting any extra benefit by using a date-time as our identifier. Indeed, having a second date-time in the log entry may actually cause confusion.

Serial Number

By "Atomic Integer", I assume you mean a serial number incrementing to increasing numbers.

This seems overkill for your purpose.

  • You don't care about the sequence, it has no meaning for this purpose of grouping log entries. You don't really care if one group came nth number before or after another group.
  • Maintaining a sequence is a pain, a point of potential failure. I've always eventually ran into administrative problems with maintaining a sequence.

So this approach adds risk without adding any special benefit.

UUID

Bingo! Just what you need.

A UUID is easily generated, using either the bundled java.util.UUID class' ability to generate Version 3 or 4 UUIDs, or using a third-party library, or accessing the command-line's uuidgen tool.

For a very high volume, [Version 1] UUID (MAC + date-time + random number) would be best. For logging, a Version 4 UUID (entirely random) is absolutely acceptable.

Having a collision is not a realistic concern. Especially for the limited number of values you would be generating for logs. I'm amazed by people who, failing to comprehend the numbers, say they would never replace a sequence with a UUID. Yet when pressed, every single programmer and sysadmin I know has experienced failures with at least one sequence.

No concerns about thread-safety. No concerns about contention (see my test results on another answer of mine).

Another benefit of a UUID is that its usual hexadecimal representation, such as:

6536ca53-bcad-4552-977f-16945fee13e2

…is easily recognizable. When recognized, the reader immediately knows that string is meant to be a unique identifier. So it's presence in your log is self-documenting.

I've found UUIDs to be the Duct Tape of computing. I keep finding new uses for them.

So, at the start of the code in question, generate a UUID and then embed that into every one of the related log entries.

While the hex string representation of a UUID is hard to read and write, in practice you need only scan a few of the digits at the beginning or end. Or use copy-paste with search and filter features in our modern console tools.

A few factoids

  • A UUID is known in the Microsoft world as as a GUID.
  • A UUID is not a string, but a 128-bit value. Bits, just bits in memory, "on"/"off" values. Some databases, such as Postgres, know how to handle and store UUID as such 128-bit values. If we wish to show those bits to humans, we could use a series of 128 digits of "1" & "0". But humans do not do well trying to read or write 128 digits of ones and zeros. So we use the hexadecimal representation. But even 32 hex digits is too much for humans, so we break the string into groups separated with hyphens as shown above, for a total of 36 characters.
  • The spec for a UUID is quite clear that a hexadecimal representation should be lowercase. The spec says that when creating a UUID from a string input, uppercase should be tolerated. But when generating a hex string, it should be lowercase. Many implementations of UUIDs ignore this requirement. I suggest sticking to the spec and converting your UUID hex strings to lowercase.

MDC – Mapped Diagnostic Context

I have not yet used MDC, but want to point it out…

Some logging frameworks are adding support for this idea of tagging related log entries. Such support is called Mapped Diagnostic Context (MDC). The MDC manages contextual information on a per thread basis.

A quick introductory article is Log4j MDC (Mapped Diagnostic Context) : What and Why .

The best logging façade, SLF4J, offers such an MDC feature. The best implementation of that façade, Logback, has a chapter documenting its MDC feature.

like image 182
Basil Bourque Avatar answered Oct 07 '22 18:10

Basil Bourque


Computers are fast, using time to attempt to create a unique value is going to fail.

Instead use a UUID. From the JSE 6.0 UUID API page "[UUID is] A class that represents an immutable universally unique identifier (UUID)."

Here is some code:

import java.util.UUID;

private String id;

id = UUID.randomUUID().toString();
like image 7
DwB Avatar answered Oct 07 '22 18:10

DwB