Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I use a boolean AND condition in a regular expression?

Say, if I have a DN string, something like this:

OU=Karen,OU=Office,OU=admin,DC=corp,DC=Fabrikam,DC=COM

How to make a regular expression to pick only DNs that have both OU=Karen and OU=admin?

like image 981
ahmd0 Avatar asked May 31 '12 04:05

ahmd0


2 Answers

This is the regex lookahead solution, matching the whole string if it contains required parts in any order just for the reference. If you do not store the pattern in some sort of configurable variable, I'd stick with nhahtdh's solution, though.

/^(?=.*OU=Karen)(?=.*OU=admin).*$/

^        - line start
(?=      - start zero-width positive lookahead
.*       - anything or nothing
OU=Karen - literal
)        - end zero-width positive lookahead
         - place as many positive or negative look-aheads as required
.*       - the whole line
$        - line end
like image 159
Eugene Ryabtsev Avatar answered Sep 21 '22 20:09

Eugene Ryabtsev


You realise you don't have to do everything with a single regex, or even one regex.

Regular expressions are very good for catching classes of input but, if you have two totally fixed strings, you can just use a contains()-type method for both of them and then and the results.

Alternatively, if you need to use regexes, you can do that twice (once per string) and and the results together.

If you need to do it with a single regex, you could try something like:

,OU=Karen,.*,OU=admin,|,OU=admin,.*,OU=Karen,

but you'll then have to also worry about when those stanzas appear at the start or end of the line, and all sorts of other edge cases (one or both at start or end, both next to each other, names like Karen7 or administrator-lesser, and so on).

Having to allow for all possibilities will probably end up with something monstrous like:

^OU=Karen(,[^,]*)*,OU=admin,|
^OU=Karen(,[^,]*)*,OU=admin$|
,OU=Karen(,[^,]*)*,OU=admin,|
,OU=Karen(,[^,]*)*,OU=admin$|
^OU=admin(,[^,]*)*,OU=Karen,|
^OU=admin(,[^,]*)*,OU=Karen$|
,OU=admin(,[^,]*)*,OU=Karen,|
,OU=admin(,[^,]*)*,OU=Karen$

although, with an advanced enouge regex engine, this may be reducible to something smaller (although it would be unlikely to be any faster, simply because of all the forward-looking/back-tracking).

One way that could be improved without a complex regex is to massage your string slightly before-hand so that boundary checks aren't needed:

newString = "," + origString.replace (",", ",,") + ","

so that it starts and ends with a comma and all commas within it are duplicated:

,OU=Karen,,OU=Office,,OU=admin,,DC=corp,,DC=Fabrikam,,DC=COM,

Then you need only check for the much simpler:

,OU=Karen,.*,OU=admin,|,OU=admin,.*,OU=Karen,

and this removes all the potential problems mentioned:

  • either at start of string.
  • either at end of string.
  • both abutting each other.
  • extended names like Karen2 being matched accidentally.

Probably the best way to do this (if your language allows) is to simply split the string on commas and examine them, something like:

str = "OU=Karen,OU=Office,OU=admin,DC=corp,DC=Fabrikam,DC=COM"
elems[] = str.splitOn(",")

gotKaren = false
gotAdmin = false
for each elem in elems:
    if elem = "OU=Karen": gotKaren = true
    if elem = "OU=admin": gotAdmin = true

if gotKaren and gotAdmin:
    weaveYourMagicHere()

This both ignores the order in which they may appear and bypasses any regex "gymnastics" that may be required to detect the edge cases.

It also has the advantage of probably being more readable than the equivalent regex :-)

like image 29
paxdiablo Avatar answered Sep 19 '22 20:09

paxdiablo