KDE fails with two points?

Question

The following trivial example returns a singular matrix. Why? Any ways to overcome it?

In: from scipy.stats import gaussian_kde
Out:

In:  points
Out: (array([63, 84]), array([46, 42]))

In:  gaussian_kde(points)
Out: (array([63, 84]), array([46, 42]))

LinAlgError: singular matrix

gg349 · Accepted Answer

Looking at the backtrace, you can see it fails when inverting the covariance matrix. This is due to exact multicollinearity of your data. From the page, you have multicollinearity in your data if two variables are collinear, i.e. if

the correlation between two independent variables is equal to 1 or -1

In this case, the two variables have only two samples, and they are always collinear (trivially, there exists always one line passing two distinct points). We can check that:

np.corrcoef(array([63,84]),array([46,42]))
[[ 1. -1.]
 [-1.  1.]]

To not be necessarily collinear, two variables must have at least n=3 samples. To add to this constraint, you have the limitation pointed out by ali_m, that the number of samples n should be greater or equal to the number of variables p. Putting the two together,

n>=max(3,p)

in this case p=2 and n>=3 is the right constraint.

KDE fails with two points?

Tags:

python

numpy

scipy

kernel-density

Josh

1 Answers

gg349

Recent Activity

Donate For Us

KDE fails with two points?

Tags:

python

numpy

scipy

kernel-density

Josh

1 Answers

gg349

Related questions

Recent Activity

Donate For Us