(1) After a process dies, is it possible that its pid
is reassigned to another process created by spawn()
?
(2) If so, how is any communication safe? For example, sending a reply to the Pid
of the message's sender. If the sender has crashed, how can we know that this Pid
doesn't belong now to another process?
(3) What guarantees are there on pid reuse? For example, is there a minimum interval before reusing a pid?
(4) What is usually done to prevent a bug due to pid reuse? Is it simply ignored?
Erlang processes are lightweight, operate in (memory) isolation from other processes, and are scheduled by Erlang's Virtual Machine (VM). The creation time of process is very low, the memory footprint of a just spawned process is very small, and a single Erlang VM can have millions of processes running.
A process can terminate itself by calling one of the BIFs exit(Reason), erlang:error(Reason), erlang:error(Reason, Args), erlang:fault(Reason) or erlang:fault(Reason, Args). The process then terminates with reason Reason for exit/1 or {Reason,Stack} for the others.
The Erlang BIF spawn is used to create a new process: spawn(Module, Exported_Function, List of Arguments).
spawn() creates a new process and returns the pid. The new process starts executing in Module:Name(Arg1,...,ArgN) where the arguments are the elements of the (possible empty) Args argument list. There exist a number of different spawn BIFs: spawn/1,2,3,4.
Quite a few questions...
Without getting exhaustive on the one hand and too pedantically handwavy on the other, let's consider a practical example: the OTP function gen_server:call/2,3
.
When you use gen_server:call/2,3
the gen_server module generates a combined message tag that looks like {self(), make_ref()}
in addition to monitoring the process that is being messaged. The sending process is guaranteed that at least if the process it is calling dies before a reply is sent that a monitor exit message will be received instead of the response, and that the PID of the dying process will match the one it just called. The receiving process that is receiving the message will be receiving both the PID of the sender and an Erlang reference that is guaranteed to be locally unique (at least for a reasonably long time -- I believe the space of uniqueness is somewhere in the billions). When it sends its response the receiving process must also know this reference as well as be addressed by the PID originally used to send.
It is possible (though extremely unlikely) that the sending process could have died and a new process could have been respawned with the same PID, but it is very close to impossible for another process to be spawned with the same PID and be blocking on a gen_server:call/2,3
message that happens to have an identical internal runtime reference as the old, dead call.
In addition to this near impossibility, let's consider a world where this one totally weird thing actually happened and all safeguards failed...
(on the order of 2^64 * 2^64 * chance_of_failure_on_this_tiny_scale()
)
The sending process would get a weird response message, and almost certainly fail an assertion match and die on the next line, and restart in a known state. The odds of that same problem happening twice is probably lower than a proton decaying within the next few minutes.
Is this "correctness"? No. There isn't such a thing as provable correctness in a massively concurrent system. That's like trying to "prove" a single equation that represents all of humanity. Most Erlang systems are chaotic by nature and so generally defy proofing as systems. What you can prove is that individual pure functions are correct, and that all the functions that a side-effecty process may call in its lifetime have definite termination conditions to include crashing on wonky data. That last part is how Erlang achieves such profound robustness as a system (well, good coding practices, adherence to functional principles and a strong culture of using Dialyzer helps a lot, too).
So... "correctness"... prove that on functions, as much as you can. It is a good thing and why we have tools like PropER and QuickCheck. As a general set of guidelines, try your best to write:
=
on it. This is why Erlang's =
is assignment, assertion, and unification all in one.Proofing at a scale larger than a function is a fool's errand unless you are in academia (that function may be calling a huge world of stuff beneath, so this isn't really much of a limitation). Proofing protocols for impossible or locked conditions is also possible, and if you have the time for this, go for it (otherwise do what the rest of us mortals do and stick to timeouts and rework code that has actually timed out on calls in its past -- this shouldn't be a regular occurrence).
All that said... Steve-O is almost certain to trip over a data cable in the data center and split a cluster many more times in the next two years than anyone is ever likely to see a PID wraparound cause an actual conflict in the next decade.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With