I've got a need to write a regular expression that has me scratching my head. Essentially I have a column of data that includes values such as:
ACME Corp 123
Corp 742 ACME
Random Text
Broadway 1785 FB
What I want to do is look for the term ACME
and BROADWAY
. If either exists, keep that and only that. If neither exists, keep the entire string. So that column above would turn in to:
ACME
ACME
Random Text
Broadway
Does that make sense?
This one had me scratching my head for a bit. I'm sure regex alone is not the best solution to this problem, however, here is your solution.
See this code in use here
Regex
^.*?((?(?=.*?(\b(?:broadway|acme)\b).*?)\2|.*)).*?$
Substitution
Group 1 as below. You can instead gather group 1 variables from an array of matches, but if you want to replace, you can use the following
$1
Note: I added another string as a test to ensure if either word was placed midway through a line, it would still catch it.
ACME Corp 123
Corp 742 ACME
Some ACME some
Random Text
Broadway 1785 FB
ACME
ACME
ACME
Random Text
Broadway
Using the case-insensitive i
and multi-line m
flags:
^
Assert position at the beginning of the line.*?
Match any character any number of times, but as few as possible((?(?=.*?(\b(?:broadway|acme)\b).*?)\2|.*))
Broken into parts
()
Capture the following
(?(?=...))
If/else statement(?=.*?(\b(?:broadway|acme)\b).*?)
Positive lookahead to match the following
.*?
Any character any number of times, but as few as possible(...)
Capture the following into capture group 2\b(?:broadway|acme)\b
word boundary, followed by either broadway
or acme
, followed by a word boundary.*?
Any character any number of times, but as few as possible\2
If the if/else statement is true (it matches the above), capture the group (as described above) - which is simply broadway
or acme
.*
If the if/else statement is false, match any character any number of times.*?
Match any character any number of times, but as few as possible$
Assert position at the end of the line—-
Since my answer has garnered decent attention, I figured I should revise it. Not sure if the attention is for if/else in regex or if it relates more to the OP’s expected results from sample input.
I should mention that the general format for regex if/else is as follows (and that only certain regex engines support this tag):
(?(?=condition)x|y)
In the above regex (?=condition) can be pretty much whatever you want (you can also use negative lookaheads or lookbehinds, even combinations thereof.
As if/else in regex isn’t supported in all languages, you may be able to use a workaround:
# optional group, fallback to match all (x?y)
^(?:.*?\b(broadway|acme)\b)?.*
# alternation (x||y)
^(?:.*?\b(broadway|acme)\b|.*)
# tempered greedy token alternation
^(?:(?!\b(?:broadway|acme)\b).|(broadway|acme))+
# same as above reusing capture group 1’s definition
^(?:(?!\b(broadway|acme)\b).|((?1)))+
A regex that will be sufficient to solve this problem is:
^(?(?=(acme|broadway))\1|[\w\s])+?$
Why is this sufficient? If either acme
or broadway
are in your input string, then group 1 will capture that value. If group 1 is empty, the full match is your result.
breakdown:
^(? # start conditional
(?= # lookahead for position before
( # group 1 start
acme|broadway # either "acme" or "broadway"
) # group 1 end
)
\1 # if found, then match group 1
| # else
[\w\s] # read a word char or space
)+?$ # do this over and over again, non-greedy
Take a look at it at example 1
Here is another attempt:
(?:^.*)(ACME)(?:.*$)?|(?:^.*)(Broadway)(?:.*$)|^.*$
And the regex code in use.
It's close to Marc Lambrichs solution, but uses two capturing groups (which is arguably worse - but it depends on your needs). If none of the two groups ($1 or $2) has a match you will find the Random Text in the full match.
If you do not like the second capturing group you can try this:
(?:^.*?)(ACME|Broadway)(?:.*$)?|^.*?$
Or if you would like to have everything in $1 like in ctwheels solution:
(?(?=(?:^.*?)?(ACME|Broadway)(?:.*$)?)\1|(^.*?$))
As pointed out by Marc, a plus of my approach is that it does not require advanced features that are not available in all regex engines.
However, conditional Regex, as used in the third regex, are not available everywhere.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With