I am trying to find the top occurrances of words in a string.
e.g.
Hello World This is a great world, This World is simply great
from the above string i am trying to calculate results something like follows:
but ignoring any words with length less then 3 characters e.g. is
which occurred twice.
I tried to look into Dictionary<key, value>
pairs, I tried to look into linq's GroupBy
extension. I know the solution lies somewhere in between but I just can't get my head around the algorithm and how to get this done.
The hashing technique is the most efficient way to find the character having the highest frequency in a string. In this technique, the string is traversed and each character of the string is hashed into an array of ASCII characters.
string words = "Hello World This is a great world, This World is simply great".ToLower();
var results = words.Split(' ').Where(x => x.Length > 3)
.GroupBy(x => x)
.Select(x => new { Count = x.Count(), Word = x.Key })
.OrderByDescending(x => x.Count);
foreach (var item in results)
Console.WriteLine(String.Format("{0} occured {1} times", item.Word, item.Count));
Console.ReadLine();
To get the word with the most occurrences:
results.First().Word;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With