Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tag Cloud in C#

Tags:

c#

tag-cloud

I am making a small C# application and would like to extract a tag cloud from a simple plain text. Is there a function that could do that for me?

like image 768
Layla Avatar asked Dec 10 '08 00:12

Layla


People also ask

What is a tag cloud?

A tag cloud, also known as a word cloud, wordle, or weighted list, is a visual representation of the most popular words (or tags) found in free-form text. The size of tags or single words, and collocations, is proportionate to how often they appear in your text.

What is a tag cloud and how does it work?

Word clouds (also known as text clouds or tag clouds) work in a simple way: the more a specific word appears in a source of textual data (such as a speech, blog post, or database), the bigger and bolder it appears in the word cloud. A word cloud is a collection, or cluster, of words depicted in different sizes.

What is the characteristics of tag cloud?

A tag cloud, word cloud or tag cloud, is a visual technique that is based on the representation of the appearance of words or tags within a text content, such as websites, articles, speeches or databases. It is a graphic representation of those words most used in the text.

What is word cloud used for?

A word cloud is a visual representation of information or data. It shows the popularity of words or phrases by making the most frequently used words appear larger or bolder compared with the other words around them.


2 Answers

Building a tag cloud is, as I see it, a two part process:

First, you need to split and count your tokens. Depending on how the document is structured, as well as the language it is written in, this could be as easy as counting the space-separated words. However, this is a very naive approach, as words like the, of, a, etc... will have the biggest word-count and are not very useful as tags. I would suggest implementing some sort of word black list, in order to exclude the most common and meaningless tags.

Once you have the result in a (tag, count) way, you could use something similar to the following code:

(Searches is a list of SearchRecordEntity, SearchRecordEntity holds the tag and its count, SearchTagElement is a subclass of SearchRecordEntity that has the TagCategory attribute,and ProcessedTags is a List of SearchTagElements which holds the result)

double max = Searches.Max(x => (double)x.Count);
List<SearchTagElement> processedTags = new List<SearchTagElement>();

foreach (SearchRecordEntity sd in Searches)
{
    var element = new SearchTagElement();                    

    double count = (double)sd.Count;
    double percent = (count / max) * 100;                    

    if (percent < 20)
    {
        element.TagCategory = "smallestTag";
    }
    else if (percent < 40)
    {
        element.TagCategory = "smallTag";
    }
    else if (percent < 60)
    {
        element.TagCategory = "mediumTag";
    }
    else if (percent < 80)
    {
        element.TagCategory = "largeTag";
    }
    else
    {
        element.TagCategory = "largestTag";
    }

    processedTags.Add(element);
}
like image 144
Ramiro Berrelleza Avatar answered Sep 21 '22 03:09

Ramiro Berrelleza


I would really recommend using http://thetagcloud.codeplex.com/. It is a very clean implementation that takes care of grouping, counting and rendering of tags. It also provides filtering capabilities.

like image 23
user85742 Avatar answered Sep 20 '22 03:09

user85742