How do I create a sklearn.datasets.base.Bunch object in scikit-learn from my own data?

Tags:

In most of the Scikit-learn algorithms, the data must be loaded as a Bunch object. For many example in the tutorial load_files() or other functions are used to populate the Bunch object. Functions like load_files() expect data to be present in certain format, but I have data stored in a different format, namely a CSV file with strings for each field.

How do I parse this and load data in the Bunch object format?

754

asked Dec 10 '13 03:12

David

1 Answers

You can do it like this:

import numpy as np
import sklearn.datasets

examples = []
examples.append('some text')
examples.append('another example text')
examples.append('example 3')

target = np.zeros((3,), dtype=np.int64)
target[0] = 0
target[1] = 1
target[2] = 0
dataset = sklearn.datasets.base.Bunch(data=examples, target=target)

123

answered Sep 28 '22 00:09

Hugh Perkins

Related questions
                            
                                Difference between score and accuracy_score in sklearn
                            
                                Why is training a random forest regressor with MAE criterion so slow compared to MSE?
                            
                                Getting feature names from within a FeatureUnion + Pipeline
                            
                                In sklearn what is the difference between a SVM model with linear kernel and a SGD classifier with loss=hinge
                            
                                In sklearn.decomposition.PCA, why are components_ negative?
                            
                                (Python - sklearn) How to pass parameters to the customize ModelTransformer class by gridsearchcv
                            
                                ValueError: continuous format is not supported
                            
                                sklearn LabelBinarizer returns vector when there are 2 classes
                            
                                Best Machine Learning package for Python 3x? [closed]
                            
                                Using partial_fit with Scikit Pipeline
                            
                                How to extract sklearn decision tree rules to pandas boolean conditions?
                            
                                How areTF-IDF calculated by the scikit-learn TfidfVectorizer
                            
                                how to save a scikit-learn pipline with keras regressor inside to disk?
                            
                                import check_arrays from sklearn
                            
                                Tuning parameters of the classifier used by BaggingClassifier
                            
                                Combining random forest models in scikit learn
                            
                                Scikit-learn : Input contains NaN, infinity or a value too large for dtype ('float64')
                            
                                Is there a quicker way of running GridsearchCV
                            
                                Scikit-Learn Vectorizer `max_features`
                            
                                Macbook m1 and python libraries [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I create a sklearn.datasets.base.Bunch object in scikit-learn from my own data?

Tags:

scikit-learn

scikits

David

People also ask

1 Answers

Hugh Perkins

Recent Activity

Donate For Us