Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get the memory location of a variable in Elixir?

Tags:

erlang

elixir

A fact that we know about Elixir is that data structures that lives in memory are immutable, and variables are just pointers to those data structures.

Is there a way for us to get the memory address of a variable is pointing to, instead of the content in that memory location (i.e. the dereferenced value of the variable)?


For me, the purpose of doing so is that we can learn about how Elixir/Erlang manages memory when dealing with duplicated values, like two same character lists, or especially in some cases that tuples and lists may share their contents, and write more efficient code.

For instance, when you update a tuple, all entries are shared between the old and the new tuple, except for the entry that has been replaced. In other words, tuples and lists in Elixir are capable of sharing their contents.

like image 416
zetavg Avatar asked Oct 21 '17 18:10

zetavg


1 Answers

TL;DR:

No, you cannot get the memory location of a variable.

Discussion

In principle everything is copied. Every process has its own heap. And that's just the way things are.

In reality there are a few underlying speed hacks. The most notable are

  • Literals known at compile time are referenced from the global heap (which in some cases is a huge performance gain).
  • Binaries larger than 64 bytes are referenced from the global heap (which also causes binaries to be a leaky abstraction, hence binary:copy/1,2).
  • Updates to most structures do not actually require copying the whole structure (of particular interest is what goes on inside maps) -- but how much and when copying is necessary is ever changing as more efficiency work goes into the runtime.
  • Garbage collection occurs per process which is why Erlang appears to have a magically advanced incremental GC scheme, but actually has quite boring generational heap collection underneath (in the general case, that is; the approach is actually somewhat of a hybrid -- one more part of the ever-evolving landscape of EVM performance enhancements...).

If you're going to be writing code for the EVM, in whatever language, you should abandon the idea that you're going to outsmart the runtime. This is for exactly the same reason that trying to outsmart the majority of C (and especially C++) compiler optimizations is a forbidden practice almost everywhere.

Every major release includes some newly implemented performance enhancement that does not break the language's assumptions. If you start writing code that is "more efficient" against some particular underlying memory scheme on R20 you might get some tiny performance boost here or there today, but when R21 comes out there is a strong chance that all of your code will be broken and you'll just be stuck with R20 forever.

Just consider the R20.0 release announcement. Keeping up with changes of this nature would consume most of your development time.

Some projects make trying to backhack a runtime their entire purpose. Consider Twisted, for example. Such projects exist specifically so that all that (large and non-trivial) effort does not have to be duplicated in every project in its downstream. With this in mind, Erlang runtime, Core compiler, LFE project, Elixir project, etc. themselves are the place for such speed hacks and absolutely not in downstream client code. The happy thing to note here (and Yes! There is a happy ending to my stern story!) is that this is exactly what we see happening.

A note on "efficiency"

What sort of efficiency are you after? Cycles? Bus traffic? Cache misses? Financial cost? I/O-ops elimination/bypassing/writethrough? More general selective buffer handling? etc.

Unless you're writing the front end for a super tight game engine on known hardware that needs to be efficient this year (because next year hardware will dwarf most speed hacks anyway), paying for more CPU time is much less expensive than the amount of developer time necessary to figure out what is happening within a massively concurrent runtime with thousands of processes sending millions of ephemeral messages around, all doing their own garbage collection at different times.

The case where someone might want to "know what is happening" that I've seen most commonly is cases where people are trying to use Elixir as "a faster Ruby" and have written not a massively concurrent system, but one massive single-threaded program on top of the EVM. That approach to writing "fast" programs on the Erlang runtime totally misses the point.

The case where you have a very specific CPU-intensive task that absolutely needs blazing speed calls for one of

  • A port written in Rust or C
  • A NIF written in Rust or C
  • A high-performance compute node that can communicate over the network to your main node(s), written in some language suited perfectly to your compute-heavy task (BERT is quite helpful there)
  • Waiting around for a year or two for the runtime to adopt more performance enhancements and hardware to get faster -- the rate of this form of speed increase is totally insane for concurrent systems, especially if you're running on your own hardware (if you are running in "the cloud", of course, these improvements are benefiting the provider but not you, and even then it is cheaper to let yourself get fleeced for more instances than try to outwit the runtime)

Writing separate programs (or NIFs) allows whatever developer or team is working on a specific problem to work in a single, unified, uniform problem space, concerned only with fulfilling whatever protocol contract they have been handed by the main project. That is dramatically more efficient than having an Erlang or LFE or Elixir dev/team flip between writing some Erlang, then some LangX, then some Erlang again, context switching as a matter of course (or worse, context switching between one language in which they are expert and one in which they are inexperienced and naive).

(Keep in mind also that NIFs should be regarded as a last resort and only considered when you know your actual case. Specifically, the case that your processing job is small per-call, predictable and well bounded, and that frequent call overhead is your bottleneck more than the processing speed within the NIF. NIFs destroy every safety guarantee of the runtime. A crashed NIF is a crashed runtime -- exactly the sort of problem Erlang was designed to avoid. Anyone who feels comfortable enough in C to flippantly recommend C NIFs all over the place clearly lacks enough experience in C to be writing those NIFs.)

At the project level, efficiency concerns are primarily business decisions, not technical ones. It is a bad business (or community management, in the case of community FOSS) decision to try to outsmart the runtime.

like image 60
zxq9 Avatar answered Oct 17 '22 23:10

zxq9