Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AttributeError: 'str' object has no attribute 'words'

I'm using Python34. I want to get frequency of words from CSV file but it show an error. Here is my code.Anyone help me to solve this problem.

from textblob import TextBlob as tb
import math

words={}
def tfidf(word, blob, bloblist):
    return tf(word, blob) * idf(word, bloblist)

def tf(word, blob):
    return blob.words.count(word) / len(blob.words)

def n_containing(word, bloblist):
    return sum(1 for blob in bloblist if word in blob)

def idf(word, bloblist):
    return math.log(len(bloblist) / (1 + n_containing(words, bloblist)))

bloblist = open('afterstopwords.csv', 'r').read()

for i, blob in enumerate(bloblist):
     print("Top words in document {}".format(i + 1))
     scores = {word: tfidf(word, blob, bloblist) for word in blob.words}
     sorted_words = sorted(scores.items(), key=lambda x: x[1], reverse=True)
     for word, score in sorted_words[:3]:
         print("\tWord: {}, TF-IDF: {}".format(word, round(score, 5)))

And the error is:

 Top words in document 1
 Traceback (most recent call last):
 File "D:\Python34\tfidf.py", line 45, in <module>
    scores = {word: tfidf(word, blob, bloblist) for word in blob.words}
 AttributeError: 'str' object has no attribute 'words'
like image 461
Anaya Avatar asked Apr 01 '26 02:04

Anaya


1 Answers

from http://stevenloria.com/finding-important-words-in-a-document-using-tf-idf/ some of the code for bloblist is:

bloblist = [document1, document2, document3]

don't change it. Plus, preceding it are code for the documents like:

document1 = tb("""blablabla""")

Here's what I did...I use a function for opening files in my python, where openfile holds the file details.

txt =openfile()  
document1=tb(txt)  
bloblist = [document1] 

THe rest of the original code is unchanged. This works BUT I have only been able to get it to finish small files. It takes much too long for larger files. And it doesn't look accurate at all. For word count I use https://rmtheis.wordpress.com/2012/09/26/count-word-frequency-with-python/
and it has worked very quickly for 9999 rows each being 50-75 characters long. Seems accurate too, results seem equivalent to wordcloud results.

like image 152
JNault Avatar answered Apr 03 '26 18:04

JNault



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!