Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find all but the first occurrence of a character with REGEX

Tags:

.net

regex

I'm building a .Net application and I need to strip any non-decimal character from a string (excluding the first '.'). Essentially I'm cleaning user input to force a real number result.

So far I've been using online RegEx tools to try and achieve this in a single pass, but I'm not getting very far.

I wish to accomplish this:

asd123.asd123.123.123 = 123.123123123

Unfortunately I've only managed to get to the stage where

asd123.asd123.123.123 = 123.123.123.123

by using this code.

System.Text.RegularExpressions.Regex.Replace(str, "[^\.|\d]*", "")

But I am stuck trying to remove all but the first decimal-point.

Can this be done in a single pass?
Is there a better-way™?

like image 700
Mike Avatar asked Nov 26 '10 16:11

Mike


1 Answers

This can be done in a single regex, at least in .NET which supports infinite repetition inside lookbehind assertions:

resultString = Regex.Replace(subjectString, @"(?<!^[^.]*)\.|[^\d.]", "");

Explanation:

(?<!^[^.]*) # Either match (as long as there is at least one dot before it)
\.          # a dot
|           # or
[^\d.]      # any characters except digits or dots.

(?<!^[^.]*) means: Assert that it's impossible to match a string that starts at the beginning of the input string and consists solely of characters other than dots. This condition is true for all dots following the first one.

like image 127
Tim Pietzcker Avatar answered Oct 18 '22 15:10

Tim Pietzcker