Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can erlang reuse process IDs? If so, how to be sure of correctness?

Tags:

erlang

(1) After a process dies, is it possible that its pid is reassigned to another process created by spawn()?

(2) If so, how is any communication safe? For example, sending a reply to the Pid of the message's sender. If the sender has crashed, how can we know that this Pid doesn't belong now to another process?

(3) What guarantees are there on pid reuse? For example, is there a minimum interval before reusing a pid?

(4) What is usually done to prevent a bug due to pid reuse? Is it simply ignored?

like image 286
Hossam El-Deen Avatar asked Sep 10 '17 06:09

Hossam El-Deen


People also ask

What is Erlang processes?

Erlang processes are lightweight, operate in (memory) isolation from other processes, and are scheduled by Erlang's Virtual Machine (VM). The creation time of process is very low, the memory footprint of a just spawned process is very small, and a single Erlang VM can have millions of processes running.

How do I terminate a process in Erlang?

A process can terminate itself by calling one of the BIFs exit(Reason), erlang:error(Reason), erlang:error(Reason, Args), erlang:fault(Reason) or erlang:fault(Reason, Args). The process then terminates with reason Reason for exit/1 or {Reason,Stack} for the others.

Which module is used to spawn processes manually in Erlang?

The Erlang BIF spawn is used to create a new process: spawn(Module, Exported_Function, List of Arguments).

What does spawn do in Erlang?

spawn() creates a new process and returns the pid. The new process starts executing in Module:Name(Arg1,...,ArgN) where the arguments are the elements of the (possible empty) Args argument list. There exist a number of different spawn BIFs: spawn/1,2,3,4.


Video Answer


1 Answers

Quite a few questions...

  1. Yes. PIDs can be reused.
  2. Communication is "safe(ish)" because the chances of hitting a reused PID with any consistency is infinitesimally small. Hitting network and hardware error is profoundly more likely than this. We design programs for robustness in the face of this and that includes programming in a way that can accept erroneous messages and/or processes spontaneously dying (the cost of such a death is built into the restart cycle).
  3. The guarantee on it is based, at least from what I have experienced with the main EVM implementation (not sure with HiPE, for example), on the fact that the integer assignment space is quite large and sending messages is dramatically faster than integer wraparound in most cases.
  4. The key to avoiding weirdness based on PID reuse is to combine it with something else that is also a unique value within the typical span of use -- and that usually takes the shape of an Erlang reference.

Without getting exhaustive on the one hand and too pedantically handwavy on the other, let's consider a practical example: the OTP function gen_server:call/2,3.

When you use gen_server:call/2,3 the gen_server module generates a combined message tag that looks like {self(), make_ref()} in addition to monitoring the process that is being messaged. The sending process is guaranteed that at least if the process it is calling dies before a reply is sent that a monitor exit message will be received instead of the response, and that the PID of the dying process will match the one it just called. The receiving process that is receiving the message will be receiving both the PID of the sender and an Erlang reference that is guaranteed to be locally unique (at least for a reasonably long time -- I believe the space of uniqueness is somewhere in the billions). When it sends its response the receiving process must also know this reference as well as be addressed by the PID originally used to send.

It is possible (though extremely unlikely) that the sending process could have died and a new process could have been respawned with the same PID, but it is very close to impossible for another process to be spawned with the same PID and be blocking on a gen_server:call/2,3 message that happens to have an identical internal runtime reference as the old, dead call.

In addition to this near impossibility, let's consider a world where this one totally weird thing actually happened and all safeguards failed...

(on the order of 2^64 * 2^64 * chance_of_failure_on_this_tiny_scale())

The sending process would get a weird response message, and almost certainly fail an assertion match and die on the next line, and restart in a known state. The odds of that same problem happening twice is probably lower than a proton decaying within the next few minutes.

Is this "correctness"? No. There isn't such a thing as provable correctness in a massively concurrent system. That's like trying to "prove" a single equation that represents all of humanity. Most Erlang systems are chaotic by nature and so generally defy proofing as systems. What you can prove is that individual pure functions are correct, and that all the functions that a side-effecty process may call in its lifetime have definite termination conditions to include crashing on wonky data. That last part is how Erlang achieves such profound robustness as a system (well, good coding practices, adherence to functional principles and a strong culture of using Dialyzer helps a lot, too).

So... "correctness"... prove that on functions, as much as you can. It is a good thing and why we have tools like PropER and QuickCheck. As a general set of guidelines, try your best to write:

  • Functions that are pure as often as possible. Have the side-effecty code be as isolated in code as possible from the pure code that does nothing but compute and return values.
  • Processes that are provably crashable. Make every line have an = on it. This is why Erlang's = is assignment, assertion, and unification all in one.
  • Protocols that have provable states. You can't make two identical processes call each other in a blocking way without a risk of a deadlock, for example. This is a fundamental limitation of concurrent systems. The CAP theorem is another. Design your systems against these constraints (which is oddly liberating) and consciously choose your tradeoffs.

Proofing at a scale larger than a function is a fool's errand unless you are in academia (that function may be calling a huge world of stuff beneath, so this isn't really much of a limitation). Proofing protocols for impossible or locked conditions is also possible, and if you have the time for this, go for it (otherwise do what the rest of us mortals do and stick to timeouts and rework code that has actually timed out on calls in its past -- this shouldn't be a regular occurrence).

All that said... Steve-O is almost certain to trip over a data cable in the data center and split a cluster many more times in the next two years than anyone is ever likely to see a PID wraparound cause an actual conflict in the next decade.

like image 89
zxq9 Avatar answered Oct 03 '22 04:10

zxq9