Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is Erlang's external term format definition stable? If not, what to use?

Tags:

erlang

The Erlang external term format has changed at least once (but this change appears to predate the history stored in the Erlang/OTP github repository); clearly, it could change in the future.

However, as a practical matter, is it generally considered safe to assume that this format is stable now? By "stable," I mean specifically that, for any term T, term_to_binary will return the same binary in any current or future version of Erlang (not merely whether it will return a binary that binary_to_term will convert back to a term identical to T). I'm interested in this property because I'd like to store hashes of arbitrary Erlang terms on disk and I want identical terms to have the same hash value now and in the future.

If it isn't safe to assume that the term format is stable, what are people using for efficient and stable term serialization?

like image 596
willb Avatar asked Nov 27 '11 05:11

willb


3 Answers

it's been stated that erlang will provide compatibility for at least 2 major releases. that would mean that BEAM files, the distribution protocol, external term format, etc from R14 will at least work up to R16.

"We have as a strategy to at least support backwards compatibility 2 major releases back in time."

"In general, we only break backward compatibility in major releases and only for a very good reason and usually after first deprecating the feature one or two releases beforehand."

like image 158
butter71 Avatar answered Nov 01 '22 08:11

butter71


erlang:phash2 is guaranteed to be a stable hash of an Erlang term.

I don't think OTP makes the guarantee made that term_to_binary(T) in vX =:= term_to_binary(T) in vY. Lots of things could change if they introduce new term codes for optimized representations of things. Or if we need to add unicode strings to the ETF or something. Or in the vanishingly unlikely future in which we introduce a new fundamental datatype. For an example of change that has happened in external representation only (stored terms compare equal, but are not byte equal) see float_ext vs. new_float_ext.

In practical terms, if you stick to atoms, lists, tuples, integers, floats and binaries, then you're probably safe with term_to_binary for quite some time. If the time comes that their ETF representation changes, then you can always write your own version of term_to_binary that doesn't change with the ETF.

like image 32
archaelus Avatar answered Nov 01 '22 09:11

archaelus


For data serialization, I usually choose between Google Protocol Buffers and JSON. Both of them are very stable. For working with these formats from Erlang I use Piqi, Erlson and mochijson2.

Big advantage of Protobuf and JSON is that they can be used from other programming languages by design, whereas Erlang external term format is more or less specific to Erlang.

Note that JSON string representation is implementation-dependent (escaped characters, floating point precision, whitespace, etc.) and for this reason it may not be suitable for your use-case.

Protobuf is less straightforward to work with compared to schemaless formats but it is a very well-designed and powerful tool.

Here are a couple of other schemaless binary serialization formats to consider. I don't know how stable they are. It may turn out that Erlang external term format is more stable.

  • https://github.com/uwiger/sext
  • https://github.com/TonyGen/bson-erlang
like image 1
alavrik Avatar answered Nov 01 '22 07:11

alavrik