Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

REGEX IF THEN ELSE Statement

Tags:

regex

I've got a need to write a regular expression that has me scratching my head. Essentially I have a column of data that includes values such as:

ACME Corp 123
Corp 742 ACME
Random Text
Broadway 1785 FB

What I want to do is look for the term ACME and BROADWAY. If either exists, keep that and only that. If neither exists, keep the entire string. So that column above would turn in to:

ACME
ACME
Random Text
Broadway

Does that make sense?

like image 338
Domo Dan Avatar asked Sep 26 '17 17:09

Domo Dan


Video Answer


3 Answers

Brief

This one had me scratching my head for a bit. I'm sure regex alone is not the best solution to this problem, however, here is your solution.


Code

See this code in use here

Regex

^.*?((?(?=.*?(\b(?:broadway|acme)\b).*?)\2|.*)).*?$

Substitution

Group 1 as below. You can instead gather group 1 variables from an array of matches, but if you want to replace, you can use the following

$1

Results

Note: I added another string as a test to ensure if either word was placed midway through a line, it would still catch it.

Input

ACME Corp 123
Corp 742 ACME
Some ACME some
Random Text
Broadway 1785 FB

Output

ACME
ACME
ACME
Random Text
Broadway

Explanation

Using the case-insensitive i and multi-line m flags:

  • ^ Assert position at the beginning of the line
  • .*? Match any character any number of times, but as few as possible
  • ((?(?=.*?(\b(?:broadway|acme)\b).*?)\2|.*)) Broken into parts
    • () Capture the following
      • (?(?=...)) If/else statement
      • (?=.*?(\b(?:broadway|acme)\b).*?) Positive lookahead to match the following
        • .*? Any character any number of times, but as few as possible
        • (...) Capture the following into capture group 2
        • \b(?:broadway|acme)\b word boundary, followed by either broadway or acme, followed by a word boundary
        • .*? Any character any number of times, but as few as possible
      • \2 If the if/else statement is true (it matches the above), capture the group (as described above) - which is simply broadway or acme
      • .* If the if/else statement is false, match any character any number of times
  • .*? Match any character any number of times, but as few as possible
  • $ Assert position at the end of the line

—-

Update

Since my answer has garnered decent attention, I figured I should revise it. Not sure if the attention is for if/else in regex or if it relates more to the OP’s expected results from sample input.

if/else

I should mention that the general format for regex if/else is as follows (and that only certain regex engines support this tag):

(?(?=condition)x|y)

In the above regex (?=condition) can be pretty much whatever you want (you can also use negative lookaheads or lookbehinds, even combinations thereof.

Alternatives

As if/else in regex isn’t supported in all languages, you may be able to use a workaround:

# optional group, fallback to match all (x?y)
^(?:.*?\b(broadway|acme)\b)?.*

# alternation (x||y)
^(?:.*?\b(broadway|acme)\b|.*)

# tempered greedy token alternation
^(?:(?!\b(?:broadway|acme)\b).|(broadway|acme))+

# same as above reusing capture group 1’s definition 
^(?:(?!\b(broadway|acme)\b).|((?1)))+
like image 181
ctwheels Avatar answered Oct 06 '22 19:10

ctwheels


A regex that will be sufficient to solve this problem is:

 ^(?(?=(acme|broadway))\1|[\w\s])+?$

Why is this sufficient? If either acme or broadway are in your input string, then group 1 will capture that value. If group 1 is empty, the full match is your result.

breakdown:

 ^(?                          # start conditional
    (?=                       # lookahead for position before
      (                       # group 1 start
        acme|broadway         # either "acme" or "broadway"
      )                       # group 1 end
    )
    \1                        # if found, then match group 1
    |                         # else
    [\w\s]                    # read a word char or space
  )+?$                        # do this over and over again, non-greedy 

Take a look at it at example 1

like image 43
Marc Lambrichs Avatar answered Oct 06 '22 18:10

Marc Lambrichs


Here is another attempt:

(?:^.*)(ACME)(?:.*$)?|(?:^.*)(Broadway)(?:.*$)|^.*$

And the regex code in use.

It's close to Marc Lambrichs solution, but uses two capturing groups (which is arguably worse - but it depends on your needs). If none of the two groups ($1 or $2) has a match you will find the Random Text in the full match.

If you do not like the second capturing group you can try this:

(?:^.*?)(ACME|Broadway)(?:.*$)?|^.*?$

Or if you would like to have everything in $1 like in ctwheels solution:

(?(?=(?:^.*?)?(ACME|Broadway)(?:.*$)?)\1|(^.*?$))

As pointed out by Marc, a plus of my approach is that it does not require advanced features that are not available in all regex engines.
However, conditional Regex, as used in the third regex, are not available everywhere.

like image 1
wp78de Avatar answered Oct 06 '22 19:10

wp78de