Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

scikit learn: Problems creating customized CountVectorizer and ChiSquare

I have the following code (based on the samples here), but it is not working:

[...]
def my_analyzer(s):
    return s.split()
my_vectorizer = CountVectorizer(analyzer=my_analyzer)
X_train = my_vectorizer.fit_transform(traindata)

ch2 = SelectKBest(chi2,k=1)
X_train = ch2.fit_transform(X_train,Y_train)
[...]

The following error is given when calling fit_transform:

AttributeError: 'function' object has no attribute 'analyze'

According to the documentation, CountVectorizer should be created like this: vectorizer = CountVectorizer(tokenizer=my_tokenizer). However, if I do that, I get the following error: "got an unexpected keyword argument 'tokenizer'".

My actual scikit-learn version is 0.10.

like image 889
D T Avatar asked Mar 06 '26 19:03

D T


1 Answers

You're looking at the documentation for 0.11 (to be released soon), where the vectorizer has been overhauled. Check the documentation for 0.10, where there is no tokenizer argument and the analyzer should be an object implementing an analyze method:

class MyAnalyzer(object):
    @staticmethod
    def analyze(s):
        return s.split()

v = CountVectorizer(analyzer=MyAnalyzer())

http://scikit-learn.org/dev is the documentation for the upcoming release (which may change at any time), while http://scikit-learn/stable has the documentation for the current stable version.

like image 77
Fred Foo Avatar answered Mar 09 '26 07:03

Fred Foo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!