The following trivial example returns a singular matrix. Why? Any ways to overcome it?
In: from scipy.stats import gaussian_kde
Out:
In: points
Out: (array([63, 84]), array([46, 42]))
In: gaussian_kde(points)
Out: (array([63, 84]), array([46, 42]))
LinAlgError: singular matrix
Looking at the backtrace, you can see it fails when inverting the covariance matrix. This is due to exact multicollinearity of your data. From the page, you have multicollinearity in your data if two variables are collinear, i.e. if
the correlation between two independent variables is equal to 1 or -1
In this case, the two variables have only two samples, and they are always collinear (trivially, there exists always one line passing two distinct points). We can check that:
np.corrcoef(array([63,84]),array([46,42]))
[[ 1. -1.]
[-1. 1.]]
To not be necessarily collinear, two variables must have at least n=3
samples. To add to this constraint, you have the limitation pointed out by ali_m, that the number of samples n
should be greater or equal to the number of variables p
. Putting the two together,
n>=max(3,p)
in this case p=2
and n>=3
is the right constraint.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With