Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Found array with 0 sample(s) (shape=(0, 40)) while a minimum of 1 is required

I'm testing a simple prediction program with Python 2.7, sklearn 0.17.1, numpy 1.11.0. I got matrix with propabilities from LDA model, and now I want create RandomForestClassifier to predict results by propabilities. My code is:

maxlen = 40
props = []
for doc in corpus:
    topics = model.get_document_topics(doc) 
    tprops = [0] * maxlen
    for topic in topics:
        tprops[topics[0]] = topics[1]
    props.append(tprops)

ntheta = np.array(props)
ny = np.array(y)

clf = RandomForestClassifier(n_estimators=100)
accuracy = cross_val_score(clf, ntheta, ny, scoring = 'accuracy')
print accuracy

ValueError                                Traceback (most recent call last)
<ipython-input-65-a7d276df43e9> in <module>()
      1 # clf.fit(nteta, ny)
      2 print nteta.shape, ny.shape
----> 3 accuracy = cross_val_score(clf, nteta, ny, scoring = 'accuracy')
      4 print accuracy

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.pyc in cross_val_score(estimator, X, y, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
   1431                                               train, test, verbose, None,
   1432                                               fit_params)
-> 1433                       for train, test in cv)
   1434     return np.array(scores)[:, 0]
   1435 

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self, iterable)
    798             # was dispatched. In particular this covers the edge
    799             # case of Parallel used with an exhausted iterator.
--> 800             while self.dispatch_one_batch(iterator):
    801                 self._iterating = True
    802             else:

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in dispatch_one_batch(self, iterator)
    656                 return False
    657             else:
--> 658                 self._dispatch(tasks)
    659                 return True
    660 

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in _dispatch(self, batch)
    564 
    565         if self._pool is None:
--> 566             job = ImmediateComputeBatch(batch)
    567             self._jobs.append(job)
    568             self.n_dispatched_batches += 1

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __init__(self, batch)
    178         # Don't delay the application, to avoid keeping the input
    179         # arguments in memory
--> 180         self.results = batch()
    181 
    182     def get(self):

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self)
     70 
     71     def __call__(self):
---> 72         return [func(*args, **kwargs) for func, args, kwargs in self.items]
     73 
     74     def __len__(self):

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.pyc in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, error_score)
   1529             estimator.fit(X_train, **fit_params)
   1530         else:
-> 1531             estimator.fit(X_train, y_train, **fit_params)
   1532 
   1533     except Exception as e:

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/ensemble/forest.pyc in fit(self, X, y, sample_weight)
    210         """
    211         # Validate or convert input data
--> 212         X = check_array(X, dtype=DTYPE, accept_sparse="csc")
    213         if issparse(X):
    214             # Pre-sort indices to avoid that each individual tree of the

/home/egor/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    405                              " minimum of %d is required%s."
    406                              % (n_samples, shape_repr, ensure_min_samples,
--> 407                                 context))
    408 
    409     if ensure_min_features > 0 and array.ndim == 2:

ValueError: Found array with 0 sample(s) (shape=(0, 40)) while a minimum of 1 is required.

UPD For what I got 2 minus? Let critic be constructive.


UPD

cotique found that y was filled incorrect (must be other classes). And if y fills correct then the problem doesn't happens. In my case classes were wrong and their count were 39774. But in theory it's not an answer, why the error happens when we have 39774 classes and have to predict them.

like image 579
egorlitvinenko Avatar asked Jun 04 '16 16:06

egorlitvinenko


1 Answers

This is the original code from the scikit-learn repo (validation.py#L409):

if ensure_min_samples > 0:
   n_samples = _num_samples(array)
   if n_samples < ensure_min_samples:
      raise ValueError("Found array with %d sample(s) (shape=%s) while a"
                       " minimum of %d is required%s."
                        % (n_samples, shape_repr, ensure_min_samples,
                        context))

So, the n_samples = _num_samples(array). By the way, array is the input object to check / convert.

Next, validation.py#L111:

def _num_samples(x):
    """Return number of samples in array-like x."""
    if hasattr(x, 'fit'):
        # stuff
    if not hasattr(x, '__len__') and not hasattr(x, 'shape'):
        # stuff
    if hasattr(x, 'shape'):
        if len(x.shape) == 0:
            # raise TypeError
        return x.shape[0]
    else:
        return len(x)

So, the number of samples equals to the length of first dimension of array, which is 0 since array.shape = (0, 40).

And I don't know what this all means, but I hope it makes things clearer.

like image 192
cotique Avatar answered Oct 20 '22 17:10

cotique