Once again I was in a design review, and encountered the claim that the probability of a particular scenario was "less than the risk of cosmic rays" affecting the program, and it occurred to me that I didn't have the faintest idea what that probability is.
"Since 2-128 is 1 out of 340282366920938463463374607431768211456, I think we're justified in taking our chances here, even if these computations are off by a factor of a few billion... We're way more at risk for cosmic rays to screw us up, I believe."
Is this programmer correct? What is the probability of a cosmic ray hitting a computer and affecting the execution of the program?
Every second, 100,000 high-energy cosmic-ray particles from distant parts of the Galaxy hit each square metre of the the Earth's atmosphere. Some of these energetic particles zap computer chips, leading to once-only glitches or 'soft fails'.
Cosmic-ray nucleons and muons can cause errors in current memories at a level of marginal significance, and there may be a very significant effect in the next generation of computer memory circuitry.
Beyond Low Earth Orbit, space radiation may place astronauts at significant risk for radiation sickness, and increased lifetime risk for cancer, central nervous system effects, and degenerative diseases.
Cosmic rays -- or rather the electrically charged particles they generate -- may be your real foe. While harmless to living organisms, a small number of these particles have enough energy to interfere with the operation of the microelectronic circuitry in our personal devices.
From Wikipedia:
Studies by IBM in the 1990s suggest that computers typically experience about one cosmic-ray-induced error per 256 megabytes of RAM per month.[15]
This means a probability of 3.7 × 10-9 per byte per month, or 1.4 × 10-15 per byte per second. If your program runs for 1 minute and occupies 20 MB of RAM, then the failure probability would be
60 × 20 × 1024² 1 - (1 - 1.4e-15) = 1.8e-6 a.k.a. "5 nines"
Error checking can help to reduce the aftermath of failure. Also, because of more compact size of chips as commented by Joe, the failure rate could be different from what it was 20 years ago.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With