Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Normalize Small Probabilities in Python

I have a list of probabilities, which I need to normalize to equal 1.0.
e.g. probs = [0.01,0.03,0.005]

I realize that this is done by dividing each probability by the sum of probs. However, if the probabilities become really small, Python will tell me that sum(probs)=0.0. I understand that this is an underflow issue. I suppose I should use the log of each probability. How would I do this?

like image 774
Adam_G Avatar asked Jan 10 '23 10:01

Adam_G


1 Answers

The sum of even very small floating point values will never truly be 0; they may be close to zero, but can never be exactly zero.

Just divide 1 by their sum, and multiply the probabilities by that factor:

def normalize(probs):
    prob_factor = 1 / sum(probs)
    return [prob_factor * p for p in probs]

Some probabilities may make up but a very small percentage in the total sum, of course, and that percentage may approach zero. But this just means that when normalising you may end up with normalized probabilities that are either very close to zero, or if smaller than the smallest representable floating point value, equal to zero. The latter only happens if there are probabilities in the list that are so much smaller than the others that they no longer represent anything close to something that'll ever occur.

Demo:

>>> def normalize(probs):
...     prob_factor = 1 / sum(probs)
...     return [prob_factor * p for p in probs]
... 
>>> normalize([0.0000000001,0.000000000003,0.000000000000005])
[0.9708266589000533, 0.029124799767001597, 4.854133294500266e-05]

And the extreme case:

>>> import sys
>>> normalize([sys.float_info.max, sys.float_info.min])
[0.9999999999999999, 0.0]
>>> normalize([sys.float_info.max, sys.float_info.min])[-1] == 0
True
like image 143
Martijn Pieters Avatar answered Jan 21 '23 12:01

Martijn Pieters