I need to remove ordinals via regex, but my regex skills are quite lacking. The following locates the ordinals, but includes the digit just prior in the return value. I need to isolate and remove just the ordinal.
[0-9](?:st|nd|rd|th)
You need to use a look-behind assertion so that only st|nd|rd|th
preceded by a [0-9]
are matched, but the [0-9]
isn't included in the match. i.e.:
(?<=[0-9])(?:st|nd|rd|th)
I've linked to the perl-compatible syntax, but if you're using posix, posix extended, vi or one of many other regex syntaxes you'll need to look up the syntax.
In perl:
$var =~ s{\b(\d+)(?:st|nd|rd|th)\b}{$1};
In PHP:
$var = preg_replace('/\\b(\d+)(?:st|nd|rd|th)\\b/', '$1', $var);
In .NET:
var = Regex.Replace(@"\b(\d+)(?:st|nd|rd|th)\b", "$1");
If you want to remove as well the numbers followed by ordinals you could use this one:
[0-9]+(?:st| st|nd| nd|rd| rd|th| th)
So for a given text: "The 3rd person is missing but the 2 nd and the 1st is here" you'll have this output: "The person is missing but the and the is here"
Try a negative lookbehind:
(?<=[0-9])(?:st|nd|rd|th)
assuming the dialect of regex supports it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With