Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the highest occuring words in a string C#

I am trying to find the top occurrances of words in a string.

e.g.

Hello World This is a great world, This World is simply great

from the above string i am trying to calculate results something like follows:

  • world, 3
  • great, 2
  • hello, 1
  • this, 2

but ignoring any words with length less then 3 characters e.g. is which occurred twice.

I tried to look into Dictionary<key, value> pairs, I tried to look into linq's GroupBy extension. I know the solution lies somewhere in between but I just can't get my head around the algorithm and how to get this done.

like image 909
Thr3e Avatar asked Jan 03 '12 02:01

Thr3e


People also ask

How do you find the highest frequency of a string?

The hashing technique is the most efficient way to find the character having the highest frequency in a string. In this technique, the string is traversed and each character of the string is hashed into an array of ASCII characters.


1 Answers

string words = "Hello World This is a great world, This World is simply great".ToLower();

var results = words.Split(' ').Where(x => x.Length > 3)
                              .GroupBy(x => x)
                              .Select(x => new { Count = x.Count(), Word = x.Key })
                              .OrderByDescending(x => x.Count);

foreach (var item in results)
    Console.WriteLine(String.Format("{0} occured {1} times", item.Word, item.Count));

Console.ReadLine();

To get the word with the most occurrences:

results.First().Word;

like image 60
Alex Avatar answered Oct 18 '22 17:10

Alex