Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compare two structures via pointer equality in Elixir / Erlang

Tags:

erlang

elixir

(Example given in Elixir.)

Suppose I have the following code,

x = {1, 2}
a1 = {"a", {1, 2}}
a2 = {"a", {1, 2}}
a3 = {"a", x}

which as far as I know creates three tuples {1, 2} at different memory locations.

Using the operators == or === for comparing any of the a variables always returns true. This is expectable, as these two operators only differ when comparing numeric types (i.e., 1 == 1.0 is different to 1 === 1.0).

So, I then tried comparing the structures via pattern matching, using the following module (strictly created to test my case),

defmodule Test do
  def same?({x, y}, {x, y}), do: true
  def same?(_, _), do: false
end

but calling Test.same?(a1, a3) also returns true.

How can I compare two structures using pointer equality, so that I can determine if they are the same structure in memory?

Thanks

like image 279
mljrg Avatar asked Sep 10 '18 13:09

mljrg


3 Answers

There is no "official" way to do this, and I would say that if you think you actually need to do this, you're doing something wrong and should ask another question about how to achieve the goal you want to achieve. So this answer is offered in the spirit of playfulness and exploration, in the hope that it spreads some interesting knowledge about the Erlang/Elixir VM.


There is a function, erts_debug:size/1, that tells you how many memory "words" an Erlang/Elixir term occupies. This table tells you how many words various terms use. In particular, a tuple uses 1 word, plus 1 word for each element, plus the storage space for any elements that are "non-immediate". We're using small integers as elements, and they are "immediates" and thus "free". So this checks out:

> :erts_debug.size({1,2})
3

Now let's make a tuple containing two of those tuples:

> :erts_debug.size({{1,2}, {1,2}})
9

That makes sense: the two inner tuples are 3 words each, and the outer tuple is 1+2 words, for a total of 9 words.

But what if we put the inner tuple in a variable?

> x = {1, 2}
{1, 2}
> :erts_debug.size({x, x})
6

Look, we saved 3 words! That's because the contents of x only counts once; the outer tuple points to the same inner tuple twice.

So let's write a little function that does this for us:

defmodule Test do
  def same?(a, b) do
    a_size = :erts_debug.size(a)
    b_size = :erts_debug.size(b)
    # Three words for the outer tuple; everything else is shared
    a_size == b_size and :erts_debug.size({a,b}) == a_size + 3
  end
end

System working? Seems to be:

> Test.same? x, {1,2}
false
> Test.same? x, x
true

Goal accomplished!


However, say we're trying to call this function from another function in a compiled module, not from the iex shell:

  def try_it() do
    x = {1, 2}
    a1 = {"a", {1, 2}}
    a2 = {"a", {1, 2}}
    a3 = {"a", x}

    IO.puts "a1 and a2 same? #{same?(a1,a2)}"
    IO.puts "a1 and a3 same? #{same?(a1,a3)}"
    IO.puts "a3 and a2 same? #{same?(a3,a2)}"
  end

That prints:

> Test.try_it
a1 and a2 same? true
a1 and a3 same? true
a3 and a2 same? true

That's because the compiler is smart enough to see that those literals are equal, and coalesces them to one term while compiling.


Note that this sharing of terms is lost when terms are sent to another process, or stored in / retrieved from an ETS table. See the Process Messages section of the Erlang Efficiency Guide for details.

like image 138
legoscia Avatar answered Oct 10 '22 05:10

legoscia


Erlang/OTP 22 (and possibly earlier) provides :erts_debug.same/2, which will allow you to do the desired memory pointer test. However, note the function is undocumented and in a module named erts_debug, so you should only rely on it for debugging and testing, and never in production code.

In my almost 9 years using Erlang/Elixir, I have only used it once, which is to test that we are not needlessly allocating structs in Ecto. Here is the commit for reference.

like image 35
José Valim Avatar answered Oct 10 '22 05:10

José Valim


Let me answer my question:

There is no need for developers to do pointer comparison explicitly, because Elixir already does that internally, in pattern matching and in operators == and === (via the corresponding Erlang operators).

For example, given

a1 = {0, {1, 2}}
a2 = {1, {1, 2}}
x = {a1, a2}
s = {1, 2}
b1 = {0, s}
b2 = {1, s}
y = {b1, b2}

in IEx we have

Interactive Elixir (1.7.3) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> a1 = {0, {1, 2}}
{0, {1, 2}}
iex(2)> a2 = {1, {1, 2}}
{1, {1, 2}}
iex(3)> x = {a1, a2}
{{0, {1, 2}}, {1, {1, 2}}}
iex(4)> s = {1, 2}
{1, 2}
iex(5)> b1 = {0, s}
{0, {1, 2}}
iex(6)> b2 = {1, s}
{1, {1, 2}}
iex(7)> y = {b1, b2}
{{0, {1, 2}}, {1, {1, 2}}}
iex(8)> :erts_debug.size(x)
15
iex(9)> :erts_debug.size(y)
12
iex(10)> x == y
true
iex(11)> x === y
true

That is, x and y are content equal, but memory different, because y occupies less memory than x as it internally shares substructure s.

In short, == and === do both content and pointer comparison. Pointer comparison is the most efficient way for Erlang to avoid traversing the same substructure on both sides of the comparison, thus saving lots of time for large shared substructures.

Now, if structural duplication across two structures is a concern, like when they are loaded from two large files with similar content, then one must compress them into two new structures sharing the parts in which they are content equal. This was the case of a1 and a2 which were compressed as b1 and b2.

like image 39
mljrg Avatar answered Oct 10 '22 06:10

mljrg