Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching plurals using regex in C#

Tags:

c#

regex

plural

I'm looking to use regex in C# to search for terms and I want to include the plurals of those terms in the search. For example if the user wants to search for 'pipe' then I want to return results for 'pipes' as well.

So I can do this...

string s ="\\b" + term + "s*\\b";
if (Regex.IsMatch(bigtext, s) {  /* do stuff */ }

How would I modify the above to allow me to match, say, 'stresses' when the user enters 'stress' and still work for 'pipe'/'pipes'?

like image 831
SAL Avatar asked Apr 24 '12 11:04

SAL


People also ask

What does *$ mean in regex?

*$ means - match, from beginning to end, any character that appears zero or more times. Basically, that means - match everything from start to end of the string.

How do you match in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

What does \s mean in regex?

The regular expression \s is a predefined character class. It indicates a single whitespace character. Let's review the set of whitespace characters: [ \t\n\x0B\f\r]

Why * is used in regex?

- a "dot" indicates any character. * - means "0 or more instances of the preceding regex token"


2 Answers

The problem you can face is that there are a lot of irregular nouns such as man, fish and index. So you should consider using the PluralizationService that has a Pluralize method. Here is an example that shows how to use it.

After you get the plural of the term, you can easily construct a regex that searches for both the plural or the singular term.

PluralizationService ps = PluralizationService.CreateService(CultureInfo.CurrentCulture);
string plural = ps.Pluralize(term);
string s = @"("+term+"|"+plural+")";
if (Regex.IsMatch(bigtext, s)) {
    /* do stuff */
}
like image 51
sch Avatar answered Nov 15 '22 18:11

sch


Here's a regex created to remove the plurals:

 /(?<![aei])([ie][d])(?=[^a-zA-Z])|(?<=[ertkgwmnl])s(?=[^a-zA-Z])/g

(Demo & source)

I know it's not exactly what you need, but it may help you find something out.

like image 42
ThdK Avatar answered Nov 15 '22 19:11

ThdK