I am writing code for a search results page that needs to highlight search terms. The terms happen to occur within table cells (the app is iterating through GridView Row Cells), and these table cells may have HTML.
Currently, my code looks like this (relevant hunks shown below):
const string highlightPattern = @"<span class=""Highlight"">$0</span>";
DataBoundLiteralControl litCustomerComments = (DataBoundLiteralControl)e.Row.Cells[CUSTOMERCOMMENTS_COLUMN].Controls[0];
// Turn "term1 term2" into "(term1|term2)"
string spaceDelimited = txtTextFilter.Text.Trim();
string pipeDelimited = string.Join("|", spaceDelimited.Split(new[] {" "}, StringSplitOptions.RemoveEmptyEntries));
string searchPattern = "(" + pipeDelimited + ")";
// Highlight search terms in Customer - Comments column
e.Row.Cells[CUSTOMERCOMMENTS_COLUMN].Text = Regex.Replace(litCustomerComments.Text, searchPattern, highlightPattern, RegexOptions.IgnoreCase);
Amazingly it works. BUT, sometimes the text I am matching on is HTML that looks like this:
<span class="CustomerName">Fred</span> was a classy individual.
And if you search for "class" I want the highlight code to wrap the "class" in "classy" but of course not the HTML attribute "class" that happens to be in there! If you search for "Fred", that should be highlighted.
So what's a good regex that will make sure matches happen only OUTSIDE the html tags? It doesn't have to be super hardcore. Simply making sure the match is not between < and > would work fine, I think.
This regex should do the job : (?<!<[^>]*)(regex you want to check: Fred|span)
It checks that it is impossible to match the regex <[^>]*
going backward starting from a matching string.
Modified code below:
const string notInsideBracketsRegex = @"(?<!<[^>]*)";
const string highlightPattern = @"<span class=""Highlight"">$0</span>";
DataBoundLiteralControl litCustomerComments = (DataBoundLiteralControl)e.Row.Cells[CUSTOMERCOMMENTS_COLUMN].Controls[0];
// Turn "term1 term2" into "(term1|term2)"
string spaceDelimited = txtTextFilter.Text.Trim();
string pipeDelimited = string.Join("|", spaceDelimited.Split(new[] {" "}, StringSplitOptions.RemoveEmptyEntries));
string searchPattern = "(" + pipeDelimited + ")";
searchPattern = notInsideBracketsRegex + searchPattern;
// Highlight search terms in Customer - Comments column
e.Row.Cells[CUSTOMERCOMMENTS_COLUMN].Text = Regex.Replace(litCustomerComments.Text, searchPattern, highlightPattern, RegexOptions.IgnoreCase);
You can use a regex with balancing groups and backreferences, but I strongly recommend that you use a parser here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With