I need to know whether coherence score of 0.4 is good or bad? I use LDA as topic modelling algorithm.
What is the average coherence score in this context?
There is no one way to determine whether the coherence score is good or bad. The score and its value depend on the data that it's calculated from. For instance, in one case, the score of 0.5 might be good enough but in another case not acceptable. The only rule is that we want to maximize this score.
Coherence measures have been proposed in the NLP community to evaluate topics constructed by some topic model. In a more general setting, coherence measures have been discussed in scientific philosophy as a formalism to quantify the hanging and fitting together of information pieces [3].
Coherence measures the relative distance between words within a topic. There are two major types C_V typically 0 < x < 1 and uMass -14 < x < 14. It's rare to see a coherence of 1 or +.9 unless the words being measured are either identical words or bigrams. Like United and States would likely return a coherence score of ~.94 or hero and hero would return a coherence of 1. The overall coherence score of a topic is the average of the distances between words. I try and attain a .7 in my LDAs if I'm using c_v I think that is a strong topic correlation. I would say:
.3 is bad
.4 is low
.55 is okay
.65 might be as good as it is going to get
.7 is nice
.8 is unlikely and
.9 is probably wrong
Low coherence fixes:
adjust your parameters alpha = .1, beta = .01 or .001, random_state = 123, etc
get better data
at .4 you probably have the wrong number of topics check out https://datascienceplus.com/evaluation-of-topic-modeling-topic-coherence/ for what is known as the elbow method - it gives you a graph of the optimal number of topics for greatest coherence in your data set. I'm using mallet which has pretty good coherance here is code to check coherence for different numbers of topics:
def compute_coherence_values(dictionary, corpus, texts, limit, start=2, step=3):
"""
Compute c_v coherence for various number of topics
Parameters:
----------
dictionary : Gensim dictionary
corpus : Gensim corpus
texts : List of input texts
limit : Max num of topics
Returns:
-------
model_list : List of LDA topic models
coherence_values : Coherence values corresponding to the LDA model with respective number of topics
"""
coherence_values = []
model_list = []
for num_topics in range(start, limit, step):
model = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=num_topics, id2word=id2word)
model_list.append(model)
coherencemodel = CoherenceModel(model=model, texts=texts, dictionary=dictionary, coherence='c_v')
coherence_values.append(coherencemodel.get_coherence())
return model_list, coherence_values
# Can take a long time to run.
model_list, coherence_values = compute_coherence_values(dictionary=id2word, corpus=corpus, texts=data_lemmatized, start=2, limit=40, step=6)
# Show graph
limit=40; start=2; step=6;
x = range(start, limit, step)
plt.plot(x, coherence_values)
plt.xlabel("Num Topics")
plt.ylabel("Coherence score")
plt.legend(("coherence_values"), loc='best')
plt.show()
# Print the coherence scores
for m, cv in zip(x, coherence_values):
print("Num Topics =", m, " has Coherence Value of", round(cv, 4))
# Select the model and print the topics
optimal_model = model_list[3]
model_topics = optimal_model.show_topics(formatted=False)
pprint(optimal_model.print_topics(num_words=10))
I hope this helps :)
In addition to the excellent answer from Sara:
UMass coherence measure how often were the two words (Wi, Wj) were seen together in the corpus. It is defined as:
D(Wi, Wj) = log [ (D(Wi, Wj) + EPSILON) / D(Wi) ]
Where: D(Wi, Wj) is how many times word Wi and word Wj appeared together
D(Wi) is how many times word Wi appeared alone in the corpus
EPSILON is a small value (like 1e-12) added to the numerator to avoid 0 values
If Wi and Wj never appear together, then this results in log(0) which will break the universe. EPSILON value is kind-of a hack to fix this.
In conclusion, you can get a value from very big negative number all the way till approx 0. Interpretation is the same as Sara wrote, the greater the number the better, where 0 would be obviously wrong.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With