Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In a regular expression, match one thing or another, or both

In a regular expression, I need to know how to match one thing or another, or both (in order). But at least one of the things needs to be there.

For example, the following regular expression

/^([0-9]+|\.[0-9]+)$/ 

will match

234 

and

.56 

but not

234.56 

While the following regular expression

/^([0-9]+)?(\.[0-9]+)?$/ 

will match all three of the strings above, but it will also match the empty string, which we do not want.

I need something that will match all three of the strings above, but not the empty string. Is there an easy way to do that?

UPDATE:

Both Andrew's and Justin's below work for the simplified example I provided, but they don't (unless I'm mistaken) work for the actual use case that I was hoping to solve, so I should probably put that in now. Here's the actual regexp I'm using:

/^\s*-?0*(?:[0-9]+|[0-9]{1,3}(?:,[0-9]{3})+)(?:\.[0-9]*)?(\s*|[A-Za-z_]*)*$/ 

This will match

45 45.988 45,689 34,569,098,233 567,900.90 -9 -34 banana fries 0.56 points 

but it WON'T match

.56 

and I need it to do this.

like image 676
rharrington Avatar asked Nov 12 '12 21:11

rharrington


People also ask

What will the regular expression match?

By default, regular expressions will match any part of a string. It's often useful to anchor the regular expression so that it matches from the start or end of the string: ^ matches the start of string. $ matches the end of the string.

What is the regular expression matching one or more specific characters?

The character + in a regular expression means "match the preceding character one or more times". For example A+ matches one or more of character A. The plus character, used in a regular expression, is called a Kleene plus .

What does ?= Mean in regex?

?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).


1 Answers

The fully general method, given regexes /^A$/ and /^B$/ is:

/^(A|B|AB)$/ 

i.e.

/^([0-9]+|\.[0-9]+|[0-9]+\.[0-9]+)$/ 

Note the others have used the structure of your example to make a simplification. Specifically, they (implicitly) factorised it, to pull out the common [0-9]* and [0-9]+ factors on the left and right.

The working for this is:

  • all the elements of the alternation end in [0-9]+, so pull that out: /^(|\.|[0-9]+\.)[0-9]+$/
  • Now we have the possibility of the empty string in the alternation, so rewrite it using ? (i.e. use the equivalence (|a|b) = (a|b)?): /^(\.|[0-9]+\.)?[0-9]+$/
  • Again, an alternation with a common suffix (\. this time): /^((|[0-9]+)\.)?[0-9]+$/
  • the pattern (|a+) is the same as a*, so, finally: /^([0-9]*\.)?[0-9]+$/
like image 95
huon Avatar answered Sep 24 '22 17:09

huon