Is there an established representation for the difference between two JSONs?

Tags:

diff

Are there any established or existing formats or conventions for representing the diff between two JSON documents?

Lets say that two remote nodes (or a server/client) both have some data represented as a potentially complex JSON, the structure of which is not known before runtime. One wants to send an update to the other, but without sending the whole state as one large JSON. Instead, just the delta. What would be a good way to represent the delta (or diff) between any two JSON documents? They will likely be very similar (one small change), but might not be.

916

asked Mar 23 '13 23:03

derekv

2 Answers

JSON documents are essentially trees, with leaves containing name/value pairs.

What you want to do is transmit a minimal tree delta: the smallest set of edits that converts one tree into another.

Computing tree deltas is a bit of an art, partly because it depends on the kinds of deltas you allow (just insert/delete leaf? swap subtrees? move subtree? duplicate subtree? rename names? replace values?). You also need to take into account semantic equivalences; if you commute the position of two subtrees, is the result semantically different? (Your delta detector may see such a tree swap; the semantic identity check may eliminate it as not interesting). If you duplicate a subtree, is the answer semantically different? (I think for JSON the effective answer is "no").

You need something like a dynamic programming algorithm to determine such a minimal delta; you can take inspiration from Levenshtein distance on strings.

This is a common issue of interest to programmers regarding source code. Think of a JSON document as source code, and see the answers at https://stackoverflow.com/q/5779160/120163 for further discussion.

137

answered Oct 21 '22 13:10

Ira Baxter

As Ira pointed out, there are some options along the lines of Levenshtein, but you'd be looking at serializing your object and comparing it lexicographically which as Ira mentioned would not take into account the JSON-specific language diff that you are looking for (two trees could be identical JSON but very different by Levenshtein distance). What you want is definitely the tree edit distance.

So to add some detail around the art of tree edit distance, the known algorithms used in this space are typically Zhang & Shasha or Klein for example and you can find python implementation of Zhang & Shasha. These algorithms will obtain the minimum number of edits to convert one tree to another thus providing your diff. However, they are somewhat slow O(n^2) at absolute best, so if you are comparing a large number of JSON objects or files, you will find yourself perfecting your golf game, doing the dishes, laundry, bathing your pets and other household miscellany.

And this is really where the art that Ira speaks of really is, because these kinds of algorithms are difficult and computationally expensive. So what you may do is get creative. One method is narrowing down the number of objects that have to be compared. For example, why calculate edit distance between two JSON objects that are clearly more similar to an intermediate than to each other? Don't calculate edit distance on objects that are identical via lexicographic comparison, If two objects are somewhat or dramatically different, perhaps forget the diff and just say there needs to be an outright replacement.

In order to apply the "art" of tree edit distance, that is saving yourself unnecessary CPU cycles, what you need is a way to provide metrics around what is meant by "somewhat similar" or "dramatically different".

To that end I've written an implementation of PQ-Gram tree edit distance approximation algorithm (http://www.vldb2005.org/program/paper/wed/p301-augsten.pdf) that you can find on github for use in Node.js or in the browser (https://github.com/hoonto/jqgram.git) based on the existing PyGram Python code (https://github.com/Sycondaman/PyGram).

PQ-Gram is much, much faster than true edit distance algorithms operating in O(n log n) time and O(n) space where n is the number of nodes.

So my recommendation is to use jqgram to very quickly get a feel for what you are looking at in terms of JSON object edit distance metrics. Determine which JSON objects should be compared, versus just replaced, and then when you want the true distance to get the diff utilize Klein or Zhang & Shasha to get the actual diff.

Here's the jqgram JSON object tree edit distance approximation example taken straight out of the README for the jqgram implementation on github:

var jq = require("jqgram").jqgram;
var root1 = {
    "thelabel": "a",
    "thekids": [
        { "thelabel": "b",
        "thekids": [
            { "thelabel": "c" },
            { "thelabel": "d" }
        ]},
        { "thelabel": "e" },
        { "thelabel": "f" }
    ]
}

var root2 = {
    "name": "a",
    "kiddos": [
        { "name": "b",
        "kiddos": [
            { "name": "c" },
            { "name": "d" },
            { "name": "y" }
        ]},
        { "name": "e" },
        { "name": "x" }
    ]
}

jq.distance({
    root: root1,
    lfn: function(node){ return node.thelabel; },
    cfn: function(node){ return node.thekids; }
},{
    root: root2,
    lfn: function(node){ return node.name; },
    cfn: function(node){ return node.kiddos; }
},{ p:2, q:3, depth:10 },
function(result) {
    console.log(result.distance);
});

The lfn and cfn parameters specify how each JSON tree should determine the node label names and the children array for each tree root independently so that you can do fun things like compare JSON objects from different sources. All you need to do is provide those functions along with each root and jqgram will do the rest, calling your lfn and cfn provided functions to build out the trees.

answered Oct 21 '22 14:10

Matt Mullens

Related questions
                            
                                Retrofit 2 : send files with json object
                            
                                How to send a POST with a JSON in a WebRequest() call using MQL4?
                            
                                How to parse ld+json using python
                            
                                Handling JSON single object and array
                            
                                Generate JSON string from dynamic ExpandoObject
                            
                                wcf json web service
                            
                                Using HTTP status codes to reflect success/failure of Web service request?
                            
                                Passing parameters in a jQuery ajax call to an ASP.NET webmethod
                            
                                Convert associative array to XML in PHP
                            
                                grails.converters.JSON except few properties
                            
                                How to return correct response on error in ASP.NET MVC 3.0 via JSON?
                            
                                jsoncpp formatting problems
                            
                                .Net 4: How to reference a dynamic object with property named "return"
                            
                                org.json.JSONException: Unterminated object at character 14
                            
                                Jackson : Converting JSON property to nested Object with Dot Notation
                            
                                Where do I specify Jackson SerializationConfig.Feature settings in Spring 3.1
                            
                                Out of memory exception in gson.fromJson()
                            
                                How to create this json string using Jackson?
                            
                                How to pretty-print JSON script in MVC 4 API
                            
                                Split a simple JSON structure using a regular expression

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With