The Erlang external term format has changed at least once (but this change appears to predate the history stored in the Erlang/OTP github repository); clearly, it could change in the future.
However, as a practical matter, is it generally considered safe to assume that this format is stable now? By "stable," I mean specifically that, for any term T
, term_to_binary
will return the same binary in any current or future version of Erlang (not merely whether it will return a binary that binary_to_term
will convert back to a term identical to T
). I'm interested in this property because I'd like to store hashes of arbitrary Erlang terms on disk and I want identical terms to have the same hash value now and in the future.
If it isn't safe to assume that the term format is stable, what are people using for efficient and stable term serialization?
it's been stated that erlang will provide compatibility for at least 2 major releases. that would mean that BEAM files, the distribution protocol, external term format, etc from R14 will at least work up to R16.
"We have as a strategy to at least support backwards compatibility 2 major releases back in time."
erlang:phash2 is guaranteed to be a stable hash of an Erlang term.
I don't think OTP makes the guarantee made that term_to_binary(T)
in vX =:= term_to_binary(T)
in vY. Lots of things could change if they introduce new term codes for optimized representations of things. Or if we need to add unicode strings to the ETF or something. Or in the vanishingly unlikely future in which we introduce a new fundamental datatype. For an example of change that has happened in external representation only (stored terms compare equal, but are not byte equal) see float_ext
vs. new_float_ext
.
In practical terms, if you stick to atoms, lists, tuples, integers, floats and binaries, then you're probably safe with term_to_binary
for quite some time. If the time comes that their ETF representation changes, then you can always write your own version of term_to_binary
that doesn't change with the ETF.
For data serialization, I usually choose between Google Protocol Buffers and JSON. Both of them are very stable. For working with these formats from Erlang I use Piqi, Erlson and mochijson2.
Big advantage of Protobuf and JSON is that they can be used from other programming languages by design, whereas Erlang external term format is more or less specific to Erlang.
Note that JSON string representation is implementation-dependent (escaped characters, floating point precision, whitespace, etc.) and for this reason it may not be suitable for your use-case.
Protobuf is less straightforward to work with compared to schemaless formats but it is a very well-designed and powerful tool.
Here are a couple of other schemaless binary serialization formats to consider. I don't know how stable they are. It may turn out that Erlang external term format is more stable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With