What's the best way to consistently hash an object/dictionary that's limited to what JSON can represent, in both JavaScript and Python? What about in many different languages?
Of course there are hash functions implemented consistently in many different languages that take a string, but to hash an object you have to convert it to a string representation first.
I want a hash function that will always return the same value for the same dictionary in any language, but the JSON spec doesn't guarantee anything about the order of keys in the serialized representation.
Do json.dumps() and JSON.stringify() behave identically? How would you verify this?
If not, is there a serialization format with libraries in many languages (I'm practically interested in Python and JavaScript but also curious about all languages) that doesn't require any additional processing by the caller to produce consistent results?
I would split this into two problems.
Use (1) to get two strings, then UTF8 encode, then use (2) to get hashes.
Since (2) is straightforward, I'll only address (1).
There are multiple facets to the problem of making sure the two JSON strings you generate are identical.
1 on one side and 1.0 on the other. (This probably won't be as big of an issue however.)" and \ be backslash-escaped in JSON strings. Most serializers do more than necessary, however, and reduce almost all Unicode characters to the \uXXXX equivalent. See json.org for the details on JSON string encoding. One way to remove all ambiguity is to only escape when absolutely necessary.You'll want to make sure all of these are matched between JavaScript and Python. Most JSON serialization libraries I've used provide configuration hooks for all of the things I mention in the list above. Unfortunately, I'm not very familiar with the JavaScript or Python libraries.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With