I am trying to compute PDF estimate from KDE computed using scikit-learn module. I have seen 2 variants of scoring and I am trying both: Statement A and B below. Statement A results in following error: <blockquote> AttributeError: 'KernelDensity' object has no attribute 'tree_' </blockquote> Statement B results in following error: <blockquote> ValueError: query data dimension must match training data dimension </blockquote> Seems like a silly error, but I cannot figure out. Please help. Code is below... <pre class="prettyprint"><code>from sklearn.neighbors import KernelDensity import numpy # d is my 1-D array data xgrid = numpy.linspace(d.min(), d.max(), 1000) density = KernelDensity(kernel='gaussian', bandwidth=0.08804).fit(d) # statement A density_score = KernelDensity(kernel='gaussian', bandwidth=0.08804).score_samples(xgrid) # statement B density_score = density.score_samples(xgrid) density_score = numpy.exp(density_score) </code></pre> If it helps, I am using 0.15.2 version of scikit-learn. I've tried this successfully with scipy.stats.gaussian_kde so there is no problem with data.

With statement B, I had the same issue with this error: <pre class="prettyprint"><code> ValueError: query data dimension must match training data dimension </code></pre> The issue here is that you have 1-D array data, but when you feed it to fit() function, it makes an assumption that you have only 1 data point with many dimensions! So for example, if your training data size is 100000 points, the your d is 100000x1, but fit takes them as 1x100000!! So, you should reshape your d before fitting: d.reshape(-1,1) and same for xgrid.shape(-1,1) <pre class="prettyprint"><code>density = KernelDensity(kernel='gaussian', bandwidth=0.08804).fit(d.reshape(-1,1)) density_score = density.score_samples(xgrid.reshape(-1,1)) </code></pre> Note: The issue with statement A, is that you are using score_samples on an object which is not fit yet!

PDF estimation in Scikit-Learn KDE

Tags:

python

scikit-learn

kernel-density

I am trying to compute PDF estimate from KDE computed using scikit-learn module. I have seen 2 variants of scoring and I am trying both: Statement A and B below.

Statement A results in following error:

AttributeError: 'KernelDensity' object has no attribute 'tree_'

Statement B results in following error:

ValueError: query data dimension must match training data dimension

Seems like a silly error, but I cannot figure out. Please help. Code is below...

Click to copy

from sklearn.neighbors import KernelDensity
import numpy

# d is my 1-D array data
xgrid = numpy.linspace(d.min(), d.max(), 1000)

density = KernelDensity(kernel='gaussian', bandwidth=0.08804).fit(d)

# statement A
density_score = KernelDensity(kernel='gaussian', bandwidth=0.08804).score_samples(xgrid)

# statement B
density_score = density.score_samples(xgrid)

density_score = numpy.exp(density_score)

If it helps, I am using 0.15.2 version of scikit-learn. I've tried this successfully with scipy.stats.gaussian_kde so there is no problem with data.

956

asked Dec 17 '14 06:12

mlworker

2 Answers

With statement B, I had the same issue with this error:

Click to copy

 ValueError: query data dimension must match training data dimension

The issue here is that you have 1-D array data, but when you feed it to fit() function, it makes an assumption that you have only 1 data point with many dimensions! So for example, if your training data size is 100000 points, the your d is 100000x1, but fit takes them as 1x100000!!

So, you should reshape your d before fitting: d.reshape(-1,1) and same for xgrid.shape(-1,1)

Click to copy

density = KernelDensity(kernel='gaussian', bandwidth=0.08804).fit(d.reshape(-1,1))
density_score = density.score_samples(xgrid.reshape(-1,1))

Note: The issue with statement A, is that you are using score_samples on an object which is not fit yet!

194

answered Oct 17 '22 02:10

Vahid Mirjalili

You need to call the fit() function before you can sample from the distribution.

answered Oct 17 '22 00:10

user1793558

Related questions
                            
                                Using Numpy Array to Create Unique Array
                            
                                Is it possible to modify Django Q() objects after construction?
                            
                                all combination of a complicated list
                            
                                UPDATE .. LIMIT 1 with SqlAlchemy and PostgreSQL
                            
                                Django Update Form?
                            
                                How to disable worker log output from celery worker?
                            
                                Ros subscriber not up to date
                            
                                Converting string to datetime object
                            
                                Flask Babel Translations path
                            
                                Why `celery.current_app` refers the default instance inside Flask view functions
                            
                                TypeError: coercing to Unicode, need string or buffer, NoneType found
                            
                                Python opposite of strip() [closed]
                            
                                Python multiple inheritance questions
                            
                                How to add a function call to a list?
                            
                                Parsing an xml file with an ordered dictionary
                            
                                python etree with xpath and namespaces with prefix
                            
                                Use same string format for multiple items in Python 3
                            
                                load .json into python; UnicodeDecodeError
                            
                                how to initializeUnorderedBulkOp()?
                            
                                Cannot perform a backup or restore operation within a transaction

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

PDF estimation in Scikit-Learn KDE

Tags:

python

scikit-learn

kernel-density

mlworker

People also ask

2 Answers

Vahid Mirjalili

user1793558

Recent Activity

Donate For Us