Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to parse international floating-point numbers

I need a regex to get numeric values that can be

111.111,11

111,111.11

111,111

And separate the integer and decimal portions so I can store in a DB with the correct syntax

I tried ([0-9]{1,3}[,.]?)+([,.][0-9]{2})? With no success since it doesn't detect the second part :(

The result should look like:

111.111,11 -> $1 = 111111; $2 = 11
like image 413
LuRsT Avatar asked Aug 18 '09 17:08

LuRsT


People also ask

What is the expression to represent floating point number?

[0-9]+|[0-9]+). This regular expression matches an optional sign, that is either followed by zero or more digits followed by a dot and one or more digits (a floating point number with optional integer part), or that is followed by one or more digits (an integer).

Can regex be used for numbers?

The regex [0-9] matches single-digit numbers 0 to 9. [1-9][0-9] matches double-digit numbers 10 to 99. That's the easy part. Matching the three-digit numbers is a little more complicated, since we need to exclude numbers 256 through 999.

How do I match a number in regex?

To match any number from 0 to 9 we use \d in regex. It will match any single digit number from 0 to 9. \d means [0-9] or match any number from 0 to 9. Instead of writing 0123456789 the shorthand version is [0-9] where [] is used for character range.

What is the regular expression for identifier?

identifier = letter (letter | digit)* real-numeral = digit digit* .


3 Answers

First Answer:

This matches #,###,##0.00:

^[+-]?[0-9]{1,3}(?:\,?[0-9]{3})*(?:\.[0-9]{2})?$

And this matches #.###.##0,00:

^[+-]?[0-9]{1,3}(?:\.?[0-9]{3})*(?:\,[0-9]{2})?$

Joining the two (there are smarter/shorter ways to write it, but it works):

(?:^[+-]?[0-9]{1,3}(?:\,?[0-9]{3})*(?:\.[0-9]{2})?$)
|(?:^[+-]?[0-9]{1,3}(?:\.?[0-9]{3})*(?:\,[0-9]{2})?$)

You can also, add a capturing group to the last comma (or dot) to check which one was used.


Second Answer:

As pointed by Alan M, my previous solution could fail to reject a value like 11,111111.00 where a comma is missing, but the other isn't. After some tests I reached the following regex that avoids this problem:

^[+-]?[0-9]{1,3}
(?:(?<comma>\,?)[0-9]{3})?
(?:\k<comma>[0-9]{3})*
(?:\.[0-9]{2})?$

This deserves some explanation:

  • ^[+-]?[0-9]{1,3} matches the first (1 to 3) digits;

  • (?:(?<comma>\,?)[0-9]{3})? matches on optional comma followed by more 3 digits, and captures the comma (or the inexistence of one) in a group called 'comma';

  • (?:\k<comma>[0-9]{3})* matches zero-to-any repetitions of the comma used before (if any) followed by 3 digits;

  • (?:\.[0-9]{2})?$ matches optional "cents" at the end of the string.

Of course, that will only cover #,###,##0.00 (not #.###.##0,00), but you can always join the regexes like I did above.


Final Answer:

Now, a complete solution. Indentations and line breaks are there for readability only.

^[+-]?[0-9]{1,3}
(?:
    (?:\,[0-9]{3})*
    (?:.[0-9]{2})?
|
    (?:\.[0-9]{3})*
    (?:\,[0-9]{2})?
|
    [0-9]*
    (?:[\.\,][0-9]{2})?
)$

And this variation captures the separators used:

^[+-]?[0-9]{1,3}
(?:
    (?:(?<thousand>\,)[0-9]{3})*
    (?:(?<decimal>\.)[0-9]{2})?
|
    (?:(?<thousand>\.)[0-9]{3})*
    (?:(?<decimal>\,)[0-9]{2})?
|
    [0-9]*
    (?:(?<decimal>[\.\,])[0-9]{2})?
)$

edit 1: "cents" are now optional; edit 2: text added; edit 3: second solution added; edit 4: complete solution added; edit 5: headings added; edit 6: capturing added; edit 7: last answer broke in two versions;

like image 183
jpbochi Avatar answered Sep 27 '22 16:09

jpbochi


I would at first use this regex to determine wether a comma or a dot is used as a comma delimiter (It fetches the last of the two):

[0-9,\.]*([,\.])[0-9]*

I would then strip all of the other sign (which the previous didn't match). If there were no matches, you already have an integer and can skip the next steps. The removal of the chosen sign can easily be done with a regex, but there are also many other functions which can do this faster/better.

You are then left with a number in the form of an integer possible followed by a comma or a dot and then the decimals, where the integer- and decimal-part easily can be separated from eachother with the following regex.

([0-9]+)[,\.]?([0-9]*)

Good luck!

Edit:

Here is an example made in python, I assume the code should be self-explaining, if it is not, just ask.

import re

input = str(raw_input())
delimiterRegex = re.compile('[0-9,\.]*([,\.])[0-9]*')
splitRegex = re.compile('([0-9]+)[,\.]?([0-9]*)')

delimiter = re.findall(delimiterRegex, input)

if (delimiter[0] == ','):
    input = re.sub('[\.]*','', input)
elif (delimiter[0] == '.'):
    input = re.sub('[,]*','', input)

print input

With this code, the following inputs gives this:

  • 111.111,11

    111111,11

  • 111,111.11

    111111.11

  • 111,111

    111,111

After this step, one can now easily modify the string to match your needs.

like image 27
Håkon Avatar answered Sep 27 '22 17:09

Håkon


How about

/(\d{1,3}(?:,\d{3})*)(\.\d{2})?/

if you care about validating that the commas separate every 3 digits exactly, or

/(\d[\d,]*)(\.\d{2})?/

if you don't.

like image 20
Avi Avatar answered Sep 27 '22 15:09

Avi