Trying to find documentation on details, I did not find a lot beyond:
To me, this leaves a lot of things in the unclear.
Is the atom word value always the same, independent of the sequence modules are loaded into a runtime instance? If modules A and B both define/reference some atoms, will the value of the atom change from session to session, depending on whether A or B was loaded first?
When matching for an atom inside a module, is there some "atom literal to atom value" resolution taking place? Do modules have some own module-local atom-value-lookup table, which gets filled in at load-time of a module?
In a distributed scenario where 2 erlang runtime instances communicate with each other. Is there some "sync-atom-tables" action going on? Or do atoms get serialized as string literals, instead of as words?
Atom is simply an ID maintained by the VM. The representation of the ID is a machine integer of the underlying architecture, e.g. 4 bytes on 32-bit systems and 8 bytes on 64-bit systems. See the usage in the LYSE book.
The same atom in the same running VM is always mapped to the same ID (integer). For example the following tuple:
{apple, pear, cherry, apple}
could be stored as the following tuple in the actual Erlang memory:
{1, 2, 3, 1}
All atoms are stored in one big table which is never garbage-collected, i.e. once an atom is created in a running VM it stays in the table until the VM is shut down.
Answering your questions:
1 . No. The ID of the atom will change between VM runs. If you shut down the VM and reload the tuple above the system might end up with the following IDs:
{50, 51, 52, 50}
depending on what other atoms have been created before it was loaded. Atoms only live as long as the VM.
2 . No. There is only one table of atoms per VM. All literal atoms in the module are mapped to their IDs when the module is loaded. If a particular atom doesn't yet exist in that table then it's inserted and stays there until the VM restarts.
3 . No. Tables with atoms are per VM and they are separate. Consider a situation when two VMs are started at the same time but they don't know of each other. Atoms created in each VM may have different IDs in the table. If at some point in time one node gets to know about the other node different atoms will have different IDs. They can't be easily synchronized or merged. But atoms aren't simply send as text representations to the other node either. They are "compressed" to a form of cache and send all together in the header. See the distribution header in the description of the communication protocol. Basically, the header contains atoms used in later terms with their IDs and textual representation. Then each term references the atom by the ID specified in the header rather than passing the same text each time.
To get really basic without going into implementation, an atom is a literal "thing" with a name. Its value is always itself and it knows its own name. You generally use it when you want the tag, like the atoms ok
and error
. Atoms are unique in the sense that there is only one atom foo
in the system, and each time I refer to foo
, I am referring to this same unique foo
irrespective of whether they are in the same module, or whether they come from the same process. There is always only one foo
.
A bit of implementation. Atoms are stored in a global atom table, and when you create a new atom, it is inserted into the table if it is not already there. This makes comparing atoms for equality very fast as you just check if the two atoms refer to the same slot in the atom table.
While separate instances of the VM, nodes, have separate atom tables, the communication between the nodes in distributed erlang is optimised for this, so very often you don't need to send the actual atom name between nodes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With