Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a object-change-tracking/versioning Java API out there?

Tags:

I know of at least two byte-code enhancer that modify the "object model" at runtime to allow transaction to be performed transparently. One of them is part of Versant VOD, which I use at work every day, and the other is part of Terracotta. There are probably quite a few others, for example in ORM, but Versant takes care of that at my company.

My question is, is there such an open-source API that can be used on it's own, independent of the product that it was designed for? You could say an "hackable" API. It should only track changes, not read access, which would slow down the code significantly. In other words, it should not require explicit read/write locking. This requires either access to all classes that perform changes, not just to the data model, or it requires to keep some form of "previous version" in memory to do a comparison.

The problem that I'm trying to solve is that I have "large" (32K to 256K) object graphs that are "serialized" in a (NoSQL) DB. They are long-lived and must be re-serialized regularly to have an "history" of the changes. But they are rather expensive to serialize, and most changes are minor.

I could serialize them fully each time and run a binary diff on the stream, but that sounds very CPU intensive. A better solution would be an API that modify write operations on the model to protocol the changes, so that after the initial "image" is stored, only the protocol need to be stored.

I've found some questions talking about Apache Commons Beanutils to compare objects, but that is not useful for in-place changes; I would need to make a complete clone of the model between every "business transaction".

To reiterate, I'm looking for an "in-memory" API, within the same JVM, which does not involve any external server application. APIs involving native code are OK if they are available on Win, Mac & Linux. The API does not have to be currently packaged independently; it just has to be possible to extract it from the "parent project" to form an independent API (the parent project license must allow this).

My object graphs will involve many large arrays, and so that needs to be supported efficiently.

The changes are not desired only for auditing, but so that they can be replayed, or undone. More precisely, with the deserialized initial graph, and a list of changes, I should arrive at an identical end graph. Also, starting with the end graph, it should be possible to go back to the initial graph by applying the changes in reverse. This uses exactly the same functionality, but requires the change protocol to keep the old value in addition to the new value.

The API license should be compatible with commercial use.

[EDIT] So far I did not get a useful answer, and it does not seem like what I want exists. That leaves me with only one option: make it happen. I'll post a link here as answer when I have a working implementation, as this is the next step in my project and I cannot go forward without it.

[EDIT] I found by accident this somewhat related question: Is there a Java library that can "diff" two Objects?

like image 255
Sebastien Diot Avatar asked Apr 06 '12 13:04

Sebastien Diot


2 Answers

Kryo v1 had a serializer that knows about the last data that was serialized and only emits a delta. When reading, it knows about the last data received and applies the delta. The delta is done on at the byte level. Here is the serializer. Most of the work is done by this class. This could be used in a few useful ways, eg networking similar to Quake 3.

This was omitted in Kryo v2 because AFAIK it had never been put to use. Also, it did not have an extensive set of tests. It could be ported though and may do what you need, or serve as the basis for what you need.

Above also posted on JVM serializers mailing list.

Doing it at the object level would be a bit tricky. You could write something similar to FieldSerializer that walks two object graphs simultaneously This would be standalone code though, not a Kryo serializer. At each level you could call equals. Write a byte so that when you read you know if it was equals. If not equals, use Kryo to write the object. Equals would be called many times for the same object, especially for deeply nested objects.

Another way you might do it is to only do the above for scalars and strings, ie only values written by the Output class. The problem is walking two object graphs. To use Kryo I think you'd have to duplicate all the serializers to know about the other object graph.

Possibly you could use Kryo with your own Output that collects values in a list instead of writing them. Use this to "serialize" your old object graph. Now write another version of your own Output that takes this list and use it to serialize your new object graph. Each time a value is written, first check it with the next object in your list. If equals, write a 1. If not equals, write a 0 and then the value.

This could be made more space efficient by using the first Output twice, once on the old and once on the new graph. Now you have two lists of values. Use these to write a bitstring denoting which are equal. This saves space over writing a whole byte for each value, but has the overhead of an extra list. Finally, write all the values that are not equal.

To finish this idea, you need to be able to deserialize the data. You'll need an your own version of the Input class that takes a list of values from the old object graph. Your Input first reads the bitstring (or a byte per value). For a value that was equal, it returns the value from the list instead of reading from the data. If a value was not equal, it calls the super method to read from the data.

I'm not sure if this would be faster than doing it at the byte level. If I had to guess I'd say it probably would be faster. Storing all values in a list will be lots of boxing/unboxing, and this approach still assigns all fields even if they haven't changed. I doubt performance will be a problem either way, so I'd probably just choose the easier approach. Hard to say which that is tho... resurrect the delta stuff or write your own Output/Input classes.

If you feel like contributing back to Kryo, that would of course be great. :)

like image 189
NateS Avatar answered Sep 28 '22 03:09

NateS


Take a look at Content repository API for Java, it is used by Artifactory to control maven dependencies. The Apache Jackrabbit is the reference implementation of this JSR (JSR-283 version 2)

like image 28
Diego Lins de Freitas Avatar answered Sep 28 '22 03:09

Diego Lins de Freitas