I am making a small C# application and would like to extract a tag cloud from a simple plain text. Is there a function that could do that for me?
A tag cloud, also known as a word cloud, wordle, or weighted list, is a visual representation of the most popular words (or tags) found in free-form text. The size of tags or single words, and collocations, is proportionate to how often they appear in your text.
Word clouds (also known as text clouds or tag clouds) work in a simple way: the more a specific word appears in a source of textual data (such as a speech, blog post, or database), the bigger and bolder it appears in the word cloud. A word cloud is a collection, or cluster, of words depicted in different sizes.
A tag cloud, word cloud or tag cloud, is a visual technique that is based on the representation of the appearance of words or tags within a text content, such as websites, articles, speeches or databases. It is a graphic representation of those words most used in the text.
A word cloud is a visual representation of information or data. It shows the popularity of words or phrases by making the most frequently used words appear larger or bolder compared with the other words around them.
Building a tag cloud is, as I see it, a two part process:
First, you need to split and count your tokens. Depending on how the document is structured, as well as the language it is written in, this could be as easy as counting the space-separated words. However, this is a very naive approach, as words like the, of, a, etc... will have the biggest word-count and are not very useful as tags. I would suggest implementing some sort of word black list, in order to exclude the most common and meaningless tags.
Once you have the result in a (tag, count) way, you could use something similar to the following code:
(Searches is a list of SearchRecordEntity, SearchRecordEntity holds the tag and its count, SearchTagElement is a subclass of SearchRecordEntity that has the TagCategory attribute,and ProcessedTags is a List of SearchTagElements which holds the result)
double max = Searches.Max(x => (double)x.Count);
List<SearchTagElement> processedTags = new List<SearchTagElement>();
foreach (SearchRecordEntity sd in Searches)
{
var element = new SearchTagElement();
double count = (double)sd.Count;
double percent = (count / max) * 100;
if (percent < 20)
{
element.TagCategory = "smallestTag";
}
else if (percent < 40)
{
element.TagCategory = "smallTag";
}
else if (percent < 60)
{
element.TagCategory = "mediumTag";
}
else if (percent < 80)
{
element.TagCategory = "largeTag";
}
else
{
element.TagCategory = "largestTag";
}
processedTags.Add(element);
}
I would really recommend using http://thetagcloud.codeplex.com/. It is a very clean implementation that takes care of grouping, counting and rendering of tags. It also provides filtering capabilities.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With