I am seeking a way to search a string for an exact match or whole word match. RegEx.Match
and RegEx.IsMatch
don't seem to get me where I want to be.
Consider the following scenario:
namespace test
{
class Program
{
static void Main(string[] args)
{
string str = "SUBTOTAL 34.37 TAX TOTAL 37.43";
int indx = str.IndexOf("TOTAL");
string amount = str.Substring(indx + "TOTAL".Length, 10);
string strAmount = Regex.Replace(amount, "[^.0-9]", "");
Console.WriteLine(strAmount);
Console.WriteLine("Press any key to continue...");
Console.ReadKey();
}
}
}
The output of the above code is:
// 34.37
// Press any key to continue...
The problem is, I don't want SUBTOTAL
, but IndexOf
finds the first occurrence of the word TOTAL
which is in SUBTOTAL
which then yields the incorrect value of 34.37.
So the question is, is there a way to force IndexOf
to find only an exact match or is there another way to force that exact whole word match so that I can find the index of that exact match and then perform some useful function with it. RegEx.IsMatch
and RegEx.Match
are, as far as I can tell, simply boolean
searches. In this case, it isn't enough to just know the exact match exists. I need to know where it exists in the string.
Any advice would be appreciated.
My method is faster than the accepted answer because it does not use Regex.
string str = "SUBTOTAL 34.37 TAX TOTAL 37.43";
var indx = str.IndexOfWholeWord("TOTAL");
public static int IndexOfWholeWord(this string str, string word)
{
for (int j = 0; j < str.Length &&
(j = str.IndexOf(word, j, StringComparison.Ordinal)) >= 0; j++)
if ((j == 0 || !char.IsLetterOrDigit(str, j - 1)) &&
(j + word.Length == str.Length || !char.IsLetterOrDigit(str, j + word.Length)))
return j;
return -1;
}
You can use word boundaries, \b
, and the Match.Index
property:
var text = "SUBTOTAL 34.37 TAX TOTAL 37.43";
var idx = Regex.Match(text, @"\bTOTAL\b").Index;
// => 19
See the C# demo.
The \bTOTAL\b
matches TOTAL
when it is not enclosed with any other letters, digits or underscores.
If you need to count a word as a whole word if it is enclosed with underscores, use
var idx = Regex.Match(text, @"(?<![^\W_])TOTAL(?![^\W_])").Index;
where (?<![^\W_])
is a negative lookbehind that fails the match if there is a character other than a non-word and underscore immediately to the left of the current location (so, there can be a start of string position, or a char that is a not a digit nor letter), and (?![^\W_])
is a similar negative lookahead that only matches if there is an end of string position or a char other than a letter or digit immediately to the right of the current location.
If the boundaries are whitespaces or start/end of string use
var idx = Regex.Match(text, @"(?<!\S)TOTAL(?!\S)").Index;
where (?<!\S)
requires start of string or a whitespace immediately on the left, and (?!\S)
requires the end of string or a whitespace on the right.
NOTE: \b
, (?<!...)
and (?!...)
are non-consuming patterns, that is the regex index does not advance when matching these patterns, thus, you get the exact positions of the word you search for.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With