I am using scikit learn and python for a few days now and more specially KernelDensity. Once the model is fitted I would like to evaluate the probability of new points. The method score() is made for this but apparently doesn't work as when I put an array as entry 1 number is the output. I use score_samples() but it is very slow.
I think that score is not working but I don't have skills to imrpove it. Please let me know if you have any idea
This is an Expert-Verified Answerit results in discontinuous shape of the histogram. The data representation is poor. The data is represented vaguely and causes disruptions. Another disadvantage is the an internal estimate of uncertainty, due to the variations in the size of the histogram.
Kernel Density Estimation (KDE) is an unsupervised learning technique that helps to estimate the PDF of a random variable in a non-parametric way. It's related to a histogram but with a data smoothing technique. Histogram and KDE visualizations: Image source.
The KDE is calculated by weighting the distances of all the data points we've seen for each location on the blue line. If we've seen more points nearby, the estimate is higher, indicating that probability of seeing a point at that location.
score() uses score_samples() as follows:
return np.sum(self.score_samples(X))
So, that's why you should use score_samples() in your case.
It's a bit hard to tell, without any code, but:
We assume your points you want to evaluate are saved within array X
and you have a kernel density estimation kde
, so you call:
logprobX = kde.score_samples(X)
But be careful, these are logarithmic! So you also need to do:
probX = np.exp(logprobX)
These values fit to your (eventually calculated) histogram.
The time running these lines are depending on the length of X. On my machine, it's quite fast to calculate 7500 pts.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With