Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression for conditionally formatting a number string

Tags:

string

c#

regex

orginal question removed


I am looking for a Regular Expression which will format a string containing of special characters, characters and numbers into a string containing only numbers. There are special cases in which it’s not enough to only replace all non-numeric characters with “” (empty).

1.) Zero in brackets.

  • If there are only zeros in a bracket (0) these should be removed if it is the first bracket pair. (The second bracket pair containing only zeros should not be removed)

2.) Leading zero.

  • All leading zero should be removed (ignoring brackets)

Examples for better understanding:

  • 123 (0) 123 would be 123123 (zero removed)
  • (0) 123 -123 would be 123123(zero and all other non-numeric characters removed)
  • 2(0) 123 (0) would be 21230 (first zero in brackets removed)
  • 20(0)123023(0) would be 201230230 (first zero in brackets removed)
  • 00(0)1 would be 1(leading zeros removed)
  • 001(1)(0) would be 110 (leading zeros removed)
  • 0(0)02(0) would be 20 (leading zeros removed)
  • 123(1)3 would be 12313 (characters removed)
like image 700
Florian Avatar asked Mar 06 '13 14:03

Florian


3 Answers

You could use a lookbehind to match (0) only if it's not at the beginning of the string, and replace with empty string as you're doing.

(original solution removed)


Updated again to reflect new requirements

Matches leading zeroes, matches (0) only if it's the first parenthesized item, and matches any non-digit characters:

^[0\D]+|(?<=^[^(]*)\(0\)|\D

Note that most regex engines do not support variable-length lookbehinds (i.e., the use of quantifiers like *), so this will only work in a few regex engines -- .NET's being one of them.

^[0\D]+      # zeroes and non-digits at start of string
|            # or
(?<=^[^(]*)  # preceded by start of string and only non-"(" chars
\(0\)        # "(0)"
|            # or
\D           # non-digit, equivalent to "[^\d]"

(tested at regexhero.net)


You've changed and added requirements several times now. For multiple rules like this, you're probably better off coding for them individually. It could become complicated and difficult to debug if one condition matches and causes another condition not to match when it should. For example, in separate steps:

  1. Remove parenthesized items as necessary.
  2. Remove non-digit characters.
  3. Remove leading zeroes.

But if you absolutely need these three conditions all matched in a single regular expression (not recommended), here it is.

like image 168
Wiseguy Avatar answered Sep 21 '22 18:09

Wiseguy


Regexes get much, much simpler if you can use multiple passes. I think you could do a first pass to drop your (0) if it's not the first thing in a string, then follow it with stripping out the non-digits:

var noMidStrParenZero = Regex.Replace(text, "^([^(]+)\(0\)", "$1");
var finalStr = Regex.Replace(noMidStrParenZero, "[^0-9]", "");

Avoids a lot of regex craziness, and it's also self-documenting to an extent.

EDIT: this version should work with your new examples too.

like image 28
super_seabass Avatar answered Sep 22 '22 18:09

super_seabass


This regex should be pretty near the one you're searching for.

(^[^\d])|([^\d](0[^\d])?)+

(You can replace everything that is caught by an empty string)

EDIT :

Your request evolved, and is now to complex to be treatd with a single pass. Assuming you always got a space before a bracket group, you can use those passes (keep this order) :

string[] entries = new string[7] {
    "800 (0) 123 - 1",
    "800 (1) 123",
    "(0)321 123",
    "1 (0) 1",
    "1 (12) (0) 1",
    "1 (0) (0) 1",
    "(9)156 (1) (0)"
};
foreach (string entry in entries)
{
    var output = Regex.Replace(entry , @"\(0\)\s*\(0\)", "0");
    output = Regex.Replace(output, @"\s\(0\)", "");
    output = Regex.Replace(output, @"[^\d]", "");
    System.Console.WriteLine("---");
    System.Console.WriteLine(entry);
    System.Console.WriteLine(output);
}
like image 40
zessx Avatar answered Sep 21 '22 18:09

zessx