Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex - How to search for singular or plural version of word [duplicate]

I'm trying to do what should be a simple Regular Expression, where all I want to do is match the singular portion of a word whether or not it has an s on the end. So if I have the following words

test
tests

EDIT: Further examples, I need to this to be possible for many words not just those two

movie
movies
page
pages
time
times

For all of them I need to get the word without the s on the end but I can't find a regular expression that will always grab the first bit without the s on the end and work for both cases.

I've tried the following:

([a-zA-Z]+)([s\b]{0,}) - This returns the full word as the first match in both cases
([a-zA-Z]+?)([s\b]{0,}) - This returns 3 different matching groups for both words
([a-zA-Z]+)([s]?) - This returns the full word as the first match in both cases
([a-zA-Z]+)(s\b) - This works for tests but doesn't match test at all
([a-zA-Z]+)(s\b)? - This returns the full word as the first match in both cases

I've been using http://gskinner.com/RegExr/ for trying out the different regex's.

EDIT: This is for a sublime text snippet, which for those that don't know a snippet in sublime text is a shortcut so that I can type say the name of my database and hit "run snippet" and it will turn it into something like:

$movies= $this->ci->db->get_where("movies", "");
if ($movies->num_rows()) {
    foreach ($movies->result() AS $movie) {

    }
}

All I need is to turn "movies" into "movie" and auto inserts it into the foreach loop.

Which means I can't just do a find and replace on the text and I only need to take 60 - 70 words into account (it's only running against my own tables, not every word in the english language).

Thanks! - Tim

like image 230
Tim Avatar asked Jul 10 '12 02:07

Tim


People also ask

How do you match a word in regex?

To run a “whole words only” search using a regular expression, simply place the word between two word boundaries, as we did with ‹ \bcat\b ›. The first ‹ \b › requires the ‹ c › to occur at the very start of the string, or after a nonword character.

What is the plural of regex?

The plural form of regex is regexes or regexen.

What is b regex?

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”. This match is zero-length. There are three different positions that qualify as word boundaries: Before the first character in the string, if the first character is a word character.


3 Answers

Ok I've found a solution:

([a-zA-Z]+?)(s\b|\b)

Works as desired, then you can simply use the first match as the unpluralized version of the word.

Thanks @Jahroy for helping me find it. I added this as answer for future surfers who just want a solution but please check out Jahroy's comment for more in depth information.

like image 91
Tim Avatar answered Nov 14 '22 21:11

Tim


For simple plurals, use this:

test(?=s| |$)

For more complex plurals, you're in trouble using regex. For example, this regex

part(y|i)(?=es | )

will return "party" or "parti", but what you do with that I'm not sure

like image 39
Bohemian Avatar answered Nov 14 '22 22:11

Bohemian


Here's how you can do it with vi or sed:

s/\([A-Za-z]\)[sS]$/\1

That replaces a bunch of letters that end with S with everything but the last letter.

NOTE:

The escape chars (backslashes before the parens) might be different in different contexts.

ALSO:

The \1 (which means the first pattern) may also vary depending on context.

ALSO:

This will only work if your word is the only word on the line.

If your table name is one of many words on the line, you could probably replace the $ (which stands for the end of the line) with a wildcard that represents whitespace or a word boundary (these differ based on context).

like image 41
jahroy Avatar answered Nov 14 '22 22:11

jahroy