Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering a String based on word count

I am trying to filter a List of strings based on the number of words in each string. I am assuming that you would trim any white-space at the ends of the string, and then count the number of spaces left in the string, so that WordCount = NumberOfSpaces + 1. Is that the most efficient way to do this? I know that for filtering based on character count the following is working fine...just cant figure out how to write it succinctly using C#/LINQ.

if (checkBox_MinMaxChars.Checked)
{
    int minChar = int.Parse(numeric_MinChars.Text);
    int maxChar = int.Parse(numeric_MaxChars.Text);

    myList = myList.Where(x => 
                              x.Length >= minChar && 
                              x.Length <= maxChar).ToList();
}

Any ideas of for counting words?

UPDATE: This Worked like a charm...Thanks Mathew:

int minWords = int.Parse(numeric_MinWords.Text);
int maxWords = int.Parse(numeric_MaxWords.Text);

sortBox1 = sortBox1.Where(x => x.Trim().Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Count() >= minWords &&
                               x.Trim().Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Count() <= maxWords).ToList();
like image 502
Jeagr Avatar asked Dec 19 '12 07:12

Jeagr


4 Answers

I would approach it in a more simplified manner since you have indicated that a space can be used reliably as a delimiter like so:

var str = "     the string to split and count        ";
var wordCount = str.Trim().Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Count();

EDIT:

If optimal perforamnce is necessary and memory usage is a concern you could write your own method and leverage IndexOf() (although there are many avenues for implementation on a problem like this, I just prefer reuse rather than from-scratch code design):

    public int WordCount(string s) {
        const int DONE = -1;
        var wordCount = 0;
        var index = 0;
        var str = s.Trim();
        while (index != DONE) {
            wordCount++;
            index = str.IndexOf(" ", index + 1);
        }
        return wordCount;
    }
like image 118
Matthew Cox Avatar answered Nov 11 '22 18:11

Matthew Cox


You approach to counting words is ok. String.Split will give similar result for more memory usage.

Than just implement your int WordCount(string text) function and pass it to Where:

myList.Where(s => WordCount(s) > minWordCount)
like image 44
Alexei Levenkov Avatar answered Nov 11 '22 18:11

Alexei Levenkov


You want all strings with word-count in a given range?

int minCount = 10;
int maxCount = 15;
IEnumerable<string> result = list
    .Select(String => new { String, Words = String.Split() })
    .Where(x => x.Words.Length >= minCount
             && x.Words.Length <= maxCount)
    .Select(x => x.String);
like image 24
Tim Schmelter Avatar answered Nov 11 '22 17:11

Tim Schmelter


how about splitting the string to an array using space and counting that?

s.Split().Count()

removed the space :)

like image 1
ufosnowcat Avatar answered Nov 11 '22 18:11

ufosnowcat