Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create vocabulary dictionary for text mining

I have the following code:

train_set = ("The sky is blue.", "The sun is bright.")
test_set = ("The sun in the sky is bright.",
    "We can see the shining sun, the bright sun.")

Now Im trying to calculate the word frequency like this:

    from sklearn.feature_extraction.text import CountVectorizer
    vectorizer = CountVectorizer()

Next I would like to print the voculabary. Therefore I do:

vectorizer.fit_transform(train_set)
print vectorizer.vocabulary

Right now I get the ouput none. While I expect something like:

{'blue': 0, 'sun': 1, 'bright': 2, 'sky': 3}

Any thoughts where this goes wrong?

like image 321
Frits Verstraten Avatar asked Feb 28 '26 23:02

Frits Verstraten


1 Answers

I think you can try this:

print vectorizer.vocabulary_
like image 133
José Sánchez Avatar answered Mar 02 '26 13:03

José Sánchez



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!