Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

kernel density score VS score_samples python scikit

I am using scikit learn and python for a few days now and more specially KernelDensity. Once the model is fitted I would like to evaluate the probability of new points. The method score() is made for this but apparently doesn't work as when I put an array as entry 1 number is the output. I use score_samples() but it is very slow.

I think that score is not working but I don't have skills to imrpove it. Please let me know if you have any idea

like image 699
Romain Avatar asked Jul 10 '14 16:07

Romain


People also ask

What is the drawback of using kernel density?

This is an Expert-Verified Answerit results in discontinuous shape of the histogram. The data representation is poor. The data is represented vaguely and causes disruptions. Another disadvantage is the an internal estimate of uncertainty, due to the variations in the size of the histogram.

What is kernel density estimation Python?

Kernel Density Estimation (KDE) is an unsupervised learning technique that helps to estimate the PDF of a random variable in a non-parametric way. It's related to a histogram but with a data smoothing technique. Histogram and KDE visualizations: Image source.

How do you evaluate kernel density estimation?

The KDE is calculated by weighting the distances of all the data points we've seen for each location on the blue line. If we've seen more points nearby, the estimate is higher, indicating that probability of seeing a point at that location.


2 Answers

score() uses score_samples() as follows:

return np.sum(self.score_samples(X))

So, that's why you should use score_samples() in your case.

like image 108
slava Avatar answered Sep 28 '22 02:09

slava


It's a bit hard to tell, without any code, but:

We assume your points you want to evaluate are saved within array X and you have a kernel density estimation kde, so you call:

logprobX = kde.score_samples(X)

But be careful, these are logarithmic! So you also need to do:

probX = np.exp(logprobX) 

These values fit to your (eventually calculated) histogram.

The time running these lines are depending on the length of X. On my machine, it's quite fast to calculate 7500 pts.

like image 38
Ben Müller Avatar answered Sep 28 '22 02:09

Ben Müller