I am generating a word cloud directly from the text file using Wordcloud packge in python. Here is the code that I am re-using from stckoverflow:
import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS
def random_color_func(word=None, font_size=None, position=None, orientation=None, font_path=None, random_state=None):
h = int(360.0 * 45.0 / 255.0)
s = int(100.0 * 255.0 / 255.0)
l = int(100.0 * float(random_state.randint(60, 120)) / 255.0)
return "hsl({}, {}%, {}%)".format(h, s, l)
file_content=open ("xyz.txt").read()
wordcloud = WordCloud(font_path = r'C:\Windows\Fonts\Verdana.ttf',
stopwords = STOPWORDS,
background_color = 'white',
width = 1200,
height = 1000,
color_func = random_color_func
).generate(file_content)
plt.imshow(wordcloud,interpolation="bilinear")
plt.axis('off')
plt.show()
It is giving me wordcloud of single words. Is there any parameter in WordCloud() function to pass n-gram without formating the text file.
I want word cloud of bigram. Or words attached with underscore in display. Like: machine_learning ( Machine and Learning would be 2 different words)
Bigram wordclouds can easily be generated by reducing the value of collocation_threshold parameter in WordCloud.
Edit the wordcloud:
wordcloud = WordCloud(font_path = r'C:\Windows\Fonts\Verdana.ttf',
stopwords = STOPWORDS,
background_color = 'white',
width = 1200,
height = 1000,
color_func = random_color_func,
collocation_threshold = 3 --added this to your question code, try changing this value between 1-50
).generate(file_content)
For more info:
collocation_threshold: int, default=30 Bigrams must have a Dunning likelihood collocation score greater than this parameter to be counted as bigrams. Default of 30 is arbitrary.
You can also find the source code for wordcloud.WordCloud here: https://amueller.github.io/word_cloud/_modules/wordcloud/wordcloud.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With