Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

string.IndexOf search for whole word match

I am seeking a way to search a string for an exact match or whole word match. RegEx.Match and RegEx.IsMatch don't seem to get me where I want to be.
Consider the following scenario:

namespace test
{
    class Program
    {
        static void Main(string[] args)
        {
            string str = "SUBTOTAL 34.37 TAX TOTAL 37.43";
            int indx = str.IndexOf("TOTAL");
            string amount = str.Substring(indx + "TOTAL".Length, 10);
            string strAmount = Regex.Replace(amount, "[^.0-9]", "");

            Console.WriteLine(strAmount);
            Console.WriteLine("Press any key to continue...");
            Console.ReadKey();
        }
    }
}

The output of the above code is:

// 34.37
// Press any key to continue...

The problem is, I don't want SUBTOTAL, but IndexOf finds the first occurrence of the word TOTAL which is in SUBTOTAL which then yields the incorrect value of 34.37.

So the question is, is there a way to force IndexOf to find only an exact match or is there another way to force that exact whole word match so that I can find the index of that exact match and then perform some useful function with it. RegEx.IsMatch and RegEx.Match are, as far as I can tell, simply boolean searches. In this case, it isn't enough to just know the exact match exists. I need to know where it exists in the string.

Any advice would be appreciated.

like image 862
D J Avatar asked Jun 26 '14 18:06

D J


2 Answers

My method is faster than the accepted answer because it does not use Regex.

string str = "SUBTOTAL 34.37 TAX TOTAL 37.43";
var indx = str.IndexOfWholeWord("TOTAL");

public static int IndexOfWholeWord(this string str, string word)
{
    for (int j = 0; j < str.Length && 
        (j = str.IndexOf(word, j, StringComparison.Ordinal)) >= 0; j++)
        if ((j == 0 || !char.IsLetterOrDigit(str, j - 1)) && 
            (j + word.Length == str.Length || !char.IsLetterOrDigit(str, j + word.Length)))
            return j;
    return -1;
}
like image 147
palota Avatar answered Oct 09 '22 14:10

palota


You can use word boundaries, \b, and the Match.Index property:

var text = "SUBTOTAL 34.37 TAX TOTAL 37.43";
var idx = Regex.Match(text, @"\bTOTAL\b").Index;
// => 19

See the C# demo.

The \bTOTAL\b matches TOTAL when it is not enclosed with any other letters, digits or underscores.

If you need to count a word as a whole word if it is enclosed with underscores, use

var idx = Regex.Match(text, @"(?<![^\W_])TOTAL(?![^\W_])").Index;

where (?<![^\W_]) is a negative lookbehind that fails the match if there is a character other than a non-word and underscore immediately to the left of the current location (so, there can be a start of string position, or a char that is a not a digit nor letter), and (?![^\W_]) is a similar negative lookahead that only matches if there is an end of string position or a char other than a letter or digit immediately to the right of the current location.

If the boundaries are whitespaces or start/end of string use

var idx = Regex.Match(text, @"(?<!\S)TOTAL(?!\S)").Index;

where (?<!\S) requires start of string or a whitespace immediately on the left, and (?!\S) requires the end of string or a whitespace on the right.

NOTE: \b, (?<!...) and (?!...) are non-consuming patterns, that is the regex index does not advance when matching these patterns, thus, you get the exact positions of the word you search for.

like image 20
Wiktor Stribiżew Avatar answered Oct 09 '22 12:10

Wiktor Stribiżew