Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx to match part of string that may or may not be present

Tags:

regex

I have strings that are in the format:

X=Foo, Y=Bar, Z=Qux

However, sometimes only the X=...Y=... parts are there, not the Z=... part, e.g:

X=Foo, Y=Bar

And also can capture commas within values, like:

X=Foo, bar, Y=Bar, Z=Qux

How can I write a regex to capture Foo, Bar, and Qux (just placeholders for this example) if present?

I've come up with this so far:

X=(.*), Y=(.*)           # Works when Z is not present
X=(.*), Y=(.*), Z=(.*)   # Works when Z is present

But I'm having trouble writing a single regex to match both cases. I also tried something like this:

X=(.*), Y=(.*)(, Z=(.*))?

I thought that by grouping the ,Z=(.*) in its own group followed by a ? it would treat the whole group as optional, but it still seems to group the Z= as part of the captured part from the Y=.

like image 619
Joseph Avatar asked Jan 12 '16 00:01

Joseph


People also ask

What does ?= * Mean in regex?

?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).

What does \+ mean in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" .

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .

How do you say does not contain in regex?

In order to match a line that does not contain something, use negative lookahead (described in Recipe 2.16). Notice that in this regular expression, a negative lookahead and a dot are repeated together using a noncapturing group.


1 Answers

You were very close - you're capturing it, but in group 4, due to the new group you introduced to make the last part optional.

Change the introduced group to a non-capturing group:

X=(.*?), Y=(.*?)(?:, Z=(.*))?$

I also fixed your capture to reluctant (instead of .*, which is greedy and consumes the entire rest of input).

See live demo.

like image 186
Bohemian Avatar answered Nov 01 '22 08:11

Bohemian