Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to match any number (Real, rational along with signs)

Tags:

java

regex

I've written a regex to match any number:

  • Positive and Negative
  • Decimal
  • Real Numbers

The following regex does well but there's one drawback

([\+\-]{1}){0,1}?[\d]*(\.{1})?[\\d]*

It is positive for inputs such as + or - as well. Any pointers will be greatly appreciated. Thanks.

The regex should work with the following inputs

5, +5, -5, 0.5, +0.5, -0.5, .5, +.5, -.5

and shouldn't match the following inputs

+

-

+.

-.

.

Here is the answer by tchrist, works perfectly.

(?:(?i)(?:[+-]?)(?:(?=[.]?[0-9])(?:[0-9]*)(?:(?:[.])(?:[0-9]{0,}))?)(?:(?:[E])(?:(?:[+-]?)(?:[0-9]+))|))
like image 262
Ragunath Jawahar Avatar asked Feb 18 '12 13:02

Ragunath Jawahar


People also ask

What does \\ mean in regex?

\\. matches the literal character . . the first backslash is interpreted as an escape character by the Emacs string reader, which combined with the second backslash, inserts a literal backslash character into the string being read. the regular expression engine receives the string \.

What does the regular expression '[ a za z ]' match?

Using character sets For example, the regular expression "[ A-Za-z] " specifies to match any single uppercase or lowercase letter. In the character set, a hyphen indicates a range of characters, for example [A-Z] will match any one capital letter.

How do you indicate a number in regex?

\d for single or multiple digit numbers To match any number from 0 to 9 we use \d in regex. It will match any single digit number from 0 to 9. \d means [0-9] or match any number from 0 to 9. Instead of writing 0123456789 the shorthand version is [0-9] where [] is used for character range.


2 Answers

If you want something that looks like a C float, here’s how to tickle Perl into coughing out a regex that does that, using the Regexp::Common module from CPAN:

$ perl -MRegexp::Common -le 'print $RE{num}{real}'
(?:(?i)(?:[+-]?)(?:(?=[.]?[0123456789])(?:[0123456789]*)(?:(?:[.])(?:[0123456789]{0,}))?)(?:(?:[E])(?:(?:[+-]?)(?:[0123456789]+))|))

You can tune that a bit if you want, but that gives you the basic idea.

It’s really remarkably flexible. For example, this spits out a pattern for base-2 real numbers taht allow commas every three places:

$ perl -MRegexp::Common -le 'print $RE{num}{real}{-base => 2}{-sep => ","}{-group => 3}'
(?:(?i)(?:[+-]?)(?:(?=[.]?[01])(?:[01]{1,3}(?:(?:[,])[01]{3})*)(?:(?:[.])(?:[01]{0,}))?)(?:(?:[E])(?:(?:[+-]?)(?:[01]+))|))

The documentation shows that the full possible syntax for the numeric patterns it can spit out for you is:

$RE{num}{int}{-base}{-sep}{-group}{-places} 
$RE{num}{real}{-base}{-radix}{-places}{-sep}{-group}{-expon} 
$RE{num}{dec}{-radix}{-places}{-sep}{-group}{-expon} 
$RE{num}{oct}{-radix}{-places}{-sep}{-group}{-expon} 
$RE{num}{bin}{-radix}{-places}{-sep}{-group}{-expon} 
$RE{num}{hex}{-radix}{-places}{-sep}{-group}{-expon} 
$RE{num}{decimal}{-base}{-radix}{-places}{-sep}{-group} 
$RE{num}{square} 
$RE{num}{roman}

Making it really to customize it for whatever you want. And yes, of course you can use these patterns in Java.

Enjoy.

like image 123
tchrist Avatar answered Oct 11 '22 22:10

tchrist


You need to require at least one digit, i.e. using + instead of * for the \d.

I think you can also drop the {1} in several places since this is implied by default

Similarly {0,1} can be dropped when followed by ?

Giving us:

regex = "[+-]?(\\d+|\\d*\\.?\\d+)";
like image 28
DNA Avatar answered Oct 11 '22 22:10

DNA