I'm trying to apply a TF-IDF in a Pandas column
data
all_cols
0 who is your hero and why
1 what do you do to relax
2 this is a hero
4 how many hours of sleep do you get a night
5 describe the last time you were relax
I know to use the CountVectorizer, I need to turn the column into list (and that's what I tried to do).
To apply TFIDF, I could not apply a list (and I tried to convert it to string).
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
import pandas as pd
df = pd.read_excel('data.xlsx')
col = df['all_cols']
corpus = col.values.tolist()
cv = CountVectorizer()
X = cv.fit_transform(corpus)
document = [' '.join(str(item)) for item in corpus]
tfidf_transformer=TfidfTransformer(smooth_idf=True,use_idf=True)
tfidf_transformer.fit(X)
feature_names=cv.get_feature_names()
tf_idf_vector=tfidf_transformer.transform(cv.transform([document]))
But I still have this error
AttributeError Traceback (most recent call last)
<ipython-input-239-92f296939ea7> in <module>()
16
---> 17 tf_idf_vector=tfidf_transformer.transform(cv.transform([documento]))
AttributeError: 'list' object has no attribute 'lower'
I'm just guessing, because I'm not using sklearn and you didn't post the full stacktrace, but the exception looks like it expects a list of strings as parameter and calls "lower()" of the string elements.
But what you are doing is giving it a list of a list with strings:
corpus = [1,2,3]
document = [' '.join(str(item)) for item in corpus]
print (document)
>>> ['1','2','3']
print ([document])
>>> [['1','2','3']]
I bet it will be fixed if you just call instead:
tf_idf_vector=tfidf_transformer.transform(cv.transform(document))
you can use sklearn pipeline which can simplify this.
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.pipeline import Pipeline
tf_idf = Pipeline([('cv',CountVectorizer()), ('tfidf_transformer',TfidfTransformer(smooth_idf=True,use_idf=True))])
tf_idf_vector = tf_idf.fit_transform(corpus)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With