Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AttributeError: 'list' object has no attribute 'lower' in TF-IDF

I'm trying to apply a TF-IDF in a Pandas column

data

    all_cols
0   who is your hero and why
1   what do you do to relax
2   this is a hero
4   how many hours of sleep do you get a night
5   describe the last time you were relax

I know to use the CountVectorizer, I need to turn the column into list (and that's what I tried to do).

To apply TFIDF, I could not apply a list (and I tried to convert it to string).

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
import pandas as pd


df = pd.read_excel('data.xlsx')
col = df['all_cols']
corpus = col.values.tolist()

cv = CountVectorizer()
X = cv.fit_transform(corpus)

document = [' '.join(str(item)) for item in corpus]

tfidf_transformer=TfidfTransformer(smooth_idf=True,use_idf=True)
tfidf_transformer.fit(X)

feature_names=cv.get_feature_names()

tf_idf_vector=tfidf_transformer.transform(cv.transform([document]))

But I still have this error

AttributeError                            Traceback (most recent call last)
<ipython-input-239-92f296939ea7> in <module>()
     16  
---> 17 tf_idf_vector=tfidf_transformer.transform(cv.transform([documento]))

AttributeError: 'list' object has no attribute 'lower'
like image 745
marin Avatar asked Sep 03 '25 06:09

marin


2 Answers

I'm just guessing, because I'm not using sklearn and you didn't post the full stacktrace, but the exception looks like it expects a list of strings as parameter and calls "lower()" of the string elements.

But what you are doing is giving it a list of a list with strings:

corpus = [1,2,3]
document = [' '.join(str(item)) for item in corpus]

print (document)
>>> ['1','2','3']
print ([document])
>>> [['1','2','3']]

I bet it will be fixed if you just call instead:

tf_idf_vector=tfidf_transformer.transform(cv.transform(document))
like image 170
Jim Panse Avatar answered Sep 04 '25 21:09

Jim Panse


you can use sklearn pipeline which can simplify this.

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.pipeline import Pipeline 

tf_idf = Pipeline([('cv',CountVectorizer()), ('tfidf_transformer',TfidfTransformer(smooth_idf=True,use_idf=True))])


tf_idf_vector  = tf_idf.fit_transform(corpus)
like image 23
qaiser Avatar answered Sep 04 '25 22:09

qaiser