I have the following regular expression which I use to find numbers in strings
-?\d*\.?\d+([eE][-+]?\d+)?
and wish to modify it such that it only matches floating point numbers and not integers. The criteria for this (as best I can discern) is that the match must feature at least one of: .
, e
, E
. However, I can not think of a nice way of incorporating this requirement into the regex without duplicating most of the body.
Duplicate
After a bit of searching I came across Regular expressions match floating point number but not integer which although not clearly titled is an exact duplicate of this problem (incl. soln).
The following regex does this, although it's a bit cryptic:
-?(?:\d+())?(?:\.\d*())?(?:e-?\d+())?(?:\2|\1\3)
Explanation:
There are three parts to a number (integer part, fractional part and exponential part). If a fractional part is present, it's a float
, but if it isn't present, the number can still be a float when an exponential part follows.
This means that we first have to make all three parts optional in the regex. But then we need to build rules that specify exactly which parts need to be there to make a valid float.
Fortunately, there's a trick that allows us to do that. An empty capturing group (()
) always matches (the empty string). A backreference to that group (\1
) only succeeds if the group has participated in the match. By inserting a ()
in each of the optional groups, we can later test whether the required parts have participated in the match.
For example, in Python:
regex = re.compile(r"""
-? # Optional minus sign
(?: # Start of the first non-capturing group:
\d+ # Match a number (integer part)
() # Match the empty string, capture in group 1
)? # Make the first non-capturing group optional
(?: # Start of the second non-capturing group:
\.\d* # Match a dot and an optional fractional part
() # Match the empty string, capture in group 2
)? # Make the second non-capturing group optional
(?: # Start of the third non-capturing group:
e # Match an e or E
-? # Match an optional minus sign
\d+ # Match a mandatory exponent
() # Match the empty string, capture in group 3
)? # Make the third non-capturing group optional
(?: # Now make sure that at least the following groups participated:
\2 # Either group 2 (containing the empty string)
| # or
\1\3 # Groups 1 and 3 (because "1" or "e1" alone aren't valid matches)
)""", re.I|re.X)
Test suite:
>>> [match.group(0) for match in
... regex.finditer("1 1.1 .1 1. 1e1 1.04E-1 -.1 -1. e1 .1e1")]
['1.1', '.1', '1.', '1e1', '1.04E-1', '-.1', '-1.', '.1e1']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With