Can I use a boolean AND condition in a regular expression?

Question

Say, if I have a DN string, something like this:

OU=Karen,OU=Office,OU=admin,DC=corp,DC=Fabrikam,DC=COM

How to make a regular expression to pick only DNs that have both OU=Karen and OU=admin?

Eugene Ryabtsev · Accepted Answer

This is the regex lookahead solution, matching the whole string if it contains required parts in any order just for the reference. If you do not store the pattern in some sort of configurable variable, I'd stick with nhahtdh's solution, though.

/^(?=.*OU=Karen)(?=.*OU=admin).*$/

^        - line start
(?=      - start zero-width positive lookahead
.*       - anything or nothing
OU=Karen - literal
)        - end zero-width positive lookahead
         - place as many positive or negative look-aheads as required
.*       - the whole line
$        - line end

paxdiablo · Answer

You realise you don't have to do everything with a single regex, or even one regex.

Regular expressions are very good for catching classes of input but, if you have two totally fixed strings, you can just use a contains()-type method for both of them and then and the results.

Alternatively, if you need to use regexes, you can do that twice (once per string) and and the results together.

If you need to do it with a single regex, you could try something like:

,OU=Karen,.*,OU=admin,|,OU=admin,.*,OU=Karen,

but you'll then have to also worry about when those stanzas appear at the start or end of the line, and all sorts of other edge cases (one or both at start or end, both next to each other, names like Karen7 or administrator-lesser, and so on).

Having to allow for all possibilities will probably end up with something monstrous like:

^OU=Karen(,[^,]*)*,OU=admin,|
^OU=Karen(,[^,]*)*,OU=admin$|
,OU=Karen(,[^,]*)*,OU=admin,|
,OU=Karen(,[^,]*)*,OU=admin$|
^OU=admin(,[^,]*)*,OU=Karen,|
^OU=admin(,[^,]*)*,OU=Karen$|
,OU=admin(,[^,]*)*,OU=Karen,|
,OU=admin(,[^,]*)*,OU=Karen$

although, with an advanced enouge regex engine, this may be reducible to something smaller (although it would be unlikely to be any faster, simply because of all the forward-looking/back-tracking).

One way that could be improved without a complex regex is to massage your string slightly before-hand so that boundary checks aren't needed:

newString = "," + origString.replace (",", ",,") + ","

so that it starts and ends with a comma and all commas within it are duplicated:

,OU=Karen,,OU=Office,,OU=admin,,DC=corp,,DC=Fabrikam,,DC=COM,

Then you need only check for the much simpler:

,OU=Karen,.*,OU=admin,|,OU=admin,.*,OU=Karen,

and this removes all the potential problems mentioned:

either at start of string.
either at end of string.
both abutting each other.
extended names like Karen2 being matched accidentally.

Probably the best way to do this (if your language allows) is to simply split the string on commas and examine them, something like:

str = "OU=Karen,OU=Office,OU=admin,DC=corp,DC=Fabrikam,DC=COM"
elems[] = str.splitOn(",")

gotKaren = false
gotAdmin = false
for each elem in elems:
    if elem = "OU=Karen": gotKaren = true
    if elem = "OU=admin": gotAdmin = true

if gotKaren and gotAdmin:
    weaveYourMagicHere()

This both ignores the order in which they may appear and bypasses any regex "gymnastics" that may be required to detect the edge cases.

It also has the advantage of probably being more readable than the equivalent regex :-)

Can I use a boolean AND condition in a regular expression?

Tags:

regex

boolean-operations

ahmd0

2 Answers

Eugene Ryabtsev

paxdiablo

Recent Activity

Donate For Us

Can I use a boolean AND condition in a regular expression?

Tags:

regex

boolean-operations

ahmd0

2 Answers

Eugene Ryabtsev

paxdiablo

Related questions

Recent Activity

Donate For Us