Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to cryptographically hash a JSON object?

The following question is more complex than it may first seem.

Assume that I've got an arbitrary JSON object, one that may contain any amount of data including other nested JSON objects. What I want is a cryptographic hash/digest of the JSON data, without regard to the actual JSON formatting itself (eg: ignoring newlines and spacing differences between the JSON tokens).

The last part is a requirement, as the JSON will be generated/read by a variety of (de)serializers on a number of different platforms. I know of at least one JSON library for Java that completely removes formatting when reading data during deserialization. As such it will break the hash.

The arbitrary data clause above also complicates things, as it prevents me from taking known fields in a given order and concatenating them prior to hasing (think roughly how Java's non-cryptographic hashCode() method works).

Lastly, hashing the entire JSON String as a chunk of bytes (prior to deserialization) is not desirable either, since there are fields in the JSON that should be ignored when computing the hash.

I'm not sure there is a good solution to this problem, but I welcome any approaches or thoughts =)

like image 417
Jason Nichols Avatar asked Jan 12 '11 15:01

Jason Nichols


People also ask

Can you hash a JSON object?

You can just stringify your JSON and create the hash based on that string. The problem is that equivalent JSON objects can be structurally different (e.g. the order of properties) which produce different strings and thus different hashes for basically the same data.

What is hash value in JSON?

Referring to JSON dictionaries as hash tables would be technically incorrect, however, as there is no particular data structure implementation associated with the JSON data itself. A hash is a random looking number which is generated from a piece of data and always the same for the same input.

How do I decode a JSON file?

You just have to use json_decode() function to convert JSON objects to the appropriate PHP data type. Example: By default the json_decode() function returns an object. You can optionally specify a second parameter that accepts a boolean value. When it is set as “true”, JSON objects are decoded into associative arrays.

Can you put Boolean in JSON?

JSON supports mainly 6 data types:boolean.


1 Answers

The problem is a common one when computing hashes for any data format where flexibility is allowed. To solve this, you need to canonicalize the representation.

For example, the OAuth1.0a protocol, which is used by Twitter and other services for authentication, requires a secure hash of the request message. To compute the hash, OAuth1.0a says you need to first alphabetize the fields, separate them by newlines, remove the field names (which are well known), and use blank lines for empty values. The signature or hash is computed on the result of that canonicalization.

XML DSIG works the same way - you need to canonicalize the XML before signing it. There is a proposed W3 standard covering this, because it's such a fundamental requirement for signing. Some people call it c14n.

I don't know of a canonicalization standard for json. It's worth researching.

If there isn't one, you can certainly establish a convention for your particular application usage. A reasonable start might be:

  • lexicographically sort the properties by name
  • double quotes used on all names
  • double quotes used on all string values
  • no space, or one-space, between names and the colon, and between the colon and the value
  • no spaces between values and the following comma
  • all other white space collapsed to either a single space or nothing - choose one
  • exclude any properties you don't want to sign (one example is, the property that holds the signature itself)
  • sign the result, with your chosen algorithm

You may also want to think about how to pass that signature in the JSON object - possibly establish a well-known property name, like "nichols-hmac" or something, that gets the base64 encoded version of the hash. This property would have to be explicitly excluded by the hashing algorithm. Then, any receiver of the JSON would be able to check the hash.

The canonicalized representation does not need to be the representation you pass around in the application. It only needs to be easily produced given an arbitrary JSON object.

like image 90
Cheeso Avatar answered Sep 28 '22 16:09

Cheeso