I have a list of probabilities, which I need to normalize to equal 1.0.
e.g. probs = [0.01,0.03,0.005]
I realize that this is done by dividing each probability by the sum of probs
. However, if the probabilities become really small, Python will tell me that sum(probs)=0.0
. I understand that this is an underflow issue. I suppose I should use the log of each probability. How would I do this?
The sum of even very small floating point values will never truly be 0; they may be close to zero, but can never be exactly zero.
Just divide 1 by their sum, and multiply the probabilities by that factor:
def normalize(probs):
prob_factor = 1 / sum(probs)
return [prob_factor * p for p in probs]
Some probabilities may make up but a very small percentage in the total sum, of course, and that percentage may approach zero. But this just means that when normalising you may end up with normalized probabilities that are either very close to zero, or if smaller than the smallest representable floating point value, equal to zero. The latter only happens if there are probabilities in the list that are so much smaller than the others that they no longer represent anything close to something that'll ever occur.
Demo:
>>> def normalize(probs):
... prob_factor = 1 / sum(probs)
... return [prob_factor * p for p in probs]
...
>>> normalize([0.0000000001,0.000000000003,0.000000000000005])
[0.9708266589000533, 0.029124799767001597, 4.854133294500266e-05]
And the extreme case:
>>> import sys
>>> normalize([sys.float_info.max, sys.float_info.min])
[0.9999999999999999, 0.0]
>>> normalize([sys.float_info.max, sys.float_info.min])[-1] == 0
True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With