I am trying to find the most important words in a corpus based on their TF-IDF scores. Been following along the example at https://radimrehurek.com/gensim/tut2.html. Based on <pre class="prettyprint"><code>>>> for doc in corpus_tfidf: ... print(doc) </code></pre> the TF-IDF score is getting updated in each iteration. For example, <ul> <li>Word 0 ("computer" based on https://radimrehurek.com/gensim/tut1.html), has a TF-IDF score of 0.5773 (Doc #1), 0.4442 (Doc #2).</li> <li>Word 10 ("graph") has a TF-IDF score of 0.7071 (Doc #7), 0.5080 (Doc #8), 0.4588 (Doc #9)</li> </ul> So here's how I am currently getting the final TF-IDF score for each word, <pre class="prettyprint"><code>tfidf = gensim.models.tfidfmodel.TfidfModel(corpus) corpus_tfidf = tfidf[corpus] d = {} for doc in corpus_tfidf: for id, value in doc: word = dictionary.get(id) d[word] = value </code></pre> Is there a better way? Thanks in advance.

How about using dictionary comprehensions? <pre class="prettyprint"><code>d = {dictionary.get(id): value for doc in corpus_tfidf for id, value in doc} </code></pre>

Getting TF-IDF Scores Of Words Using Gensim

Tags:

python

gensim

tf-idf

I am trying to find the most important words in a corpus based on their TF-IDF scores.

Been following along the example at https://radimrehurek.com/gensim/tut2.html. Based on

Click to copy

>>> for doc in corpus_tfidf:
...     print(doc)

the TF-IDF score is getting updated in each iteration. For example,

Word 0 ("computer" based on https://radimrehurek.com/gensim/tut1.html), has a TF-IDF score of 0.5773 (Doc #1), 0.4442 (Doc #2).
Word 10 ("graph") has a TF-IDF score of 0.7071 (Doc #7), 0.5080 (Doc #8), 0.4588 (Doc #9)

So here's how I am currently getting the final TF-IDF score for each word,

Click to copy

tfidf = gensim.models.tfidfmodel.TfidfModel(corpus)
corpus_tfidf = tfidf[corpus]
d = {}
for doc in corpus_tfidf:
    for id, value in doc:
        word = dictionary.get(id)
        d[word] = value

Is there a better way?

Thanks in advance.

582

asked Apr 15 '16 17:04

user799188

1 Answers

How about using dictionary comprehensions?

Click to copy

d = {dictionary.get(id): value for doc in corpus_tfidf for id, value in doc}

178

answered Oct 22 '22 04:10

satojkovic

Related questions
                            
                                SQLAlchemy func.count with filter
                            
                                python Sphinx "the module executes module level statement and it might call sys.exit()."
                            
                                Using PIP in a Azure WebApp
                            
                                Plotly: How to add custom legend
                            
                                Install lxml on Centos 7 - error: command 'gcc' failed with exit status 4
                            
                                use vpn with python requests
                            
                                User group assignment track in django admin
                            
                                Redis py: when to use connection pool?
                            
                                Using f-score in xgb
                            
                                "canonical" way to use logging for Python asserts
                            
                                Expressing pandas subset using pipe
                            
                                Linear Regression with positive coefficients in Python
                            
                                Theano: Initialisation of device gpu failed! Reason=CNMEM_STATUS_OUT_OF_MEMORY
                            
                                What is the best way to top k pool elements instead of only the max one in Tensorflow?
                            
                                How to preserve Labels when SPSS file (.sav) imported into pandas via rpy?
                            
                                Remove interpolation Time series plot for missing values
                            
                                Executing `from abc import xyz` where does the module `abc` go?
                            
                                Python Pandas: Convert 2,000,000 DataFrame rows to Binary Matrix (pd.get_dummies()) without memory error?
                            
                                How to get the Worksheet ID from a Google Spreadsheet with python?
                            
                                Pandas str.replace of pipe character not working?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Getting TF-IDF Scores Of Words Using Gensim

Tags:

python

gensim

tf-idf

user799188

People also ask

1 Answers

satojkovic

Recent Activity

Donate For Us