Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

KDE fails with two points?

The following trivial example returns a singular matrix. Why? Any ways to overcome it?

In: from scipy.stats import gaussian_kde
Out:

In:  points
Out: (array([63, 84]), array([46, 42]))

In:  gaussian_kde(points)
Out: (array([63, 84]), array([46, 42]))

LinAlgError: singular matrix
like image 243
Josh Avatar asked Feb 15 '23 01:02

Josh


1 Answers

Looking at the backtrace, you can see it fails when inverting the covariance matrix. This is due to exact multicollinearity of your data. From the page, you have multicollinearity in your data if two variables are collinear, i.e. if

the correlation between two independent variables is equal to 1 or -1

In this case, the two variables have only two samples, and they are always collinear (trivially, there exists always one line passing two distinct points). We can check that:

np.corrcoef(array([63,84]),array([46,42]))
[[ 1. -1.]
 [-1.  1.]]

To not be necessarily collinear, two variables must have at least n=3 samples. To add to this constraint, you have the limitation pointed out by ali_m, that the number of samples n should be greater or equal to the number of variables p. Putting the two together,

n>=max(3,p)

in this case p=2 and n>=3 is the right constraint.

like image 69
gg349 Avatar answered Feb 23 '23 00:02

gg349