I am trying to use KNN to correctly classify .wav files into two groups, group 0 and group 1.
I extracted the data, created the model, fit the model, however when I try and use the .predict() method I get the following error:
Traceback (most recent call last):
File "/..../....../KNN.py", line 20, in <module>
classifier.fit(X_train, y_train)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/neighbors/base.py", line 761, in fit
X, y = check_X_y(X, y, "csr", multi_output=True)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/utils/validation.py", line 521, in check_X_y
ensure_min_features, warn_on_dtype, estimator)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/utils/validation.py", line 405, in check_array
% (array.ndim, estimator_name))
ValueError: Found array with dim 3. Estimator expected <= 2.
I have found these two stackoverflow posts which describe similar issues:
sklearn Logistic Regression "ValueError: Found array with dim 3. Estimator expected <= 2."
Error: Found array with dim 3. Estimator expected <= 2
And, correct me if I'm wrong, but it appears that scikit-learn can only accept 2-dimensional data.
My training data has shape (3240, 20, 5255) Which consists of:
My testing data has shape (3240,) #category is 0 or 1
What code can I use to manipulated my training and testing data to convert it into a form that is usable by scikit-learn? Also, how can I ensure that data is not lost when I go down from 3 dimensions to 2 dimensions?
It is true, sklearn works only with 2D data.
What you can try to do:
np.reshape
on the training data to convert it to shape (3240, 20*5255)
. It will preserve all the original information. But sklearn will not be able to exploit the implicit structure in this data (e.g. that features 1, 21, 41, etc. are different versions of the same variable).tensorflow+Keras
stack). CNNs were designed specially to handle such multidimensional data and exploit its structure. But they have lots of hyperparameters to tune.(3240, 20*5255)
. It fill try to preserve as much information as possible, while still keeping number of features low. If you had more data (e.g. 100K examples), the first approach might work best. In your case (3K examples and 10K features) you need to regularize your model heavily to avoid overfitting.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With