MultinomialNB error: "Unknown Label Type"

Tags:

I have two numpy arrays, X_train and Y_train, where the first of dimensions (700,1000) is populated by the values 0, 1, 2, 3, 4, and 10. The second of dimensions (700,) is populated by the values 'fresh' or 'rotten', since I'm working with Rotten Tomatoes's API. For some reason, when I execute:

nb = MultinomialNB()
nb.fit(X_train, Y_train)

I get:

ValueError: Unknown label type

I tried building a smaller pair of arrays:

print xs, '\n', ys

gives

[[0 0 0 0 1]
 [1 0 0 2 5]
 [3 2 5 5 0]
 [3 2 0 0 1]
 [1 5 1 0 0]]

['rotten' 'fresh' 'fresh' 'rotten' 'fresh']

and the multinomial NB fit gives no Unknown Label error. Any ideas on why this is happening?

I also checked the unique values in X_train, Y_train with numpy.unique and it doesn't seem like there are any weird or mistyped labels -- it's all 'fresh' or 'rotten'.

My code for generating X_train and Y_train:

def make_xy(critics, vectorizer=None):
    stext = critics['quote'].tolist() # need to have a list
    if vectorizer == None:
        vectorizer = CountVectorizer(min_df=0)
    vectorizer.fit(stext)
    X = vectorizer.transform(stext).toarray() # this is X
    Y = np.asarray(critics['fresh'])
    return X[0:1000,0:1000], Y[0:1000] # this is X_train, Y_train

where 'critics' is a pandas dataframe imported from a CSV file (https://www.dropbox.com/s/0lu5oujfm483wtr/critics.csv), and cleaned of any missing data:

critics = pd.read_csv('critics.csv')
critics = critics[~critics.quote.isnull()]
critics = critics[critics.fresh != 'none']
critics = critics[critics.quote.str.len() > 0]

864

asked Dec 21 '13 19:12

covariance

1 Answers

The problems seems to be the dtype of y. looks like numpy didnt manage to figure out it was a string. so it was set to a generic object. If you change:
Y = np.asarray(critics['fresh']) to Y = np.asarray(critics['fresh'], dtype="|S6") i think it should work.

139

answered Oct 14 '22 18:10

M4rtini

Related questions
                            
                                How can I change name of arbitrary columns in pandas df using lambda function?
                            
                                Django aggregate count of records per day
                            
                                Copying python lists
                            
                                Playback loop option in OpenCV videos
                            
                                How do I make a pop up in Tkinter when a button is clicked?
                            
                                How to make sure you call pip only in virtualenv?
                            
                                (Python) How to get diagonal(A*B) without having to perform A*B?
                            
                                Hist in matplotlib: Bins are not centered and proportions not correct on the axis
                            
                                In Python, how do I find common words from two lists while preserving word order?
                            
                                How can I get the index value of a list comprehension?
                            
                                python password generator for django
                            
                                Unable to convert PostgreSQL text column to bytea
                            
                                OverflowError: Python int too large to convert to C long
                            
                                What is "homogenous" in Python list documentation?
                            
                                Heroku and OpenCV with Python
                            
                                Sort multiple lists simultaneously
                            
                                save multiple uploaded files in django
                            
                                Plotting a large number of points using matplotlib and running out of memory
                            
                                Argparse argument generated help, 'metavar' with choices
                            
                                How to write multi column in clause with sqlalchemy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

MultinomialNB error: "Unknown Label Type"

Tags:

python

numpy

scikit-learn

covariance

People also ask

1 Answers

M4rtini

Recent Activity

Donate For Us