Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how do I include a boolean AND within a regex?

Tags:

python

regex

Is there a way to get single regex to satisfy this condition??

I am looking for a "word" that has three letters from the set MBIPI, any order, but MUST contain an I.

ie.

re.match("[MBDPI]{3}", foo) and "I" in foo

So this is the correct result (in python using the re module), but can I get this from a single regex?

>>> for foo in ("MBI", "MIB", "BIM", "BMI", "IBM", "IMB", "MBD"):
...     print foo,
...     print re.match("[MBDPI]{3}", foo) and "I" in foo
MBI True
MIB True
BIM True
BMI True
IBM True
IMB True
MBD False

with regex I know I can use | as a boolean OR operator, but is there a boolean AND equivalent?

or maybe I need some forward or backward lookup?

like image 770
user213043 Avatar asked Mar 05 '10 09:03

user213043


People also ask

How do you combine two regular expressions?

to combine two expressions or more, put every expression in brackets, and use: *? This are the signs to combine, in order of relevance: ?

What is a boolean in regex?

A. 3 Boolean Regular ExpressionsSpecifies that the preceding and following regular expressions must both match. Boolean Or. or. Specifies that one of the preceding and following regular expressions must match.

What does ?= * Mean in regex?

. means match any character in regular expressions. * means zero or more occurrences of the SINGLE regex preceding it.

Is there a regex and operator?

In Chain Builder, you can use regular expression (regex) operators to match characters in text strings, such as to define patterns for: Mapping transformation rules for a Data Prep connector pipeline. The File Utilities connector's Find, Find and replace, and Split file commands.


2 Answers

You can fake boolean AND by using lookaheads. According to http://www.regular-expressions.info/lookaround2.html, this will work for your case:

"\b(?=[MBDPI]{3}\b)\w*I\w*"
like image 51
Jens Avatar answered Sep 19 '22 06:09

Jens


Or is about the only thing you can do:

\b(I[MBDPI]{2}|[MBDPI]I[MBDPI]|[MBDPI]{2}I)\b

The \b character matches a zero-width word boundary. This ensures you match something that is exactly three characters long.

You're otherwise running into the limits to what a regular language can do.

An alternative is to match:

\b[MBDPI]{3}\b

capture that group and then look for an I.

Edit: for the sake of having a complete answer, I'll adapt Jens' answer that uses Testing The Same Part of a String for More Than One Requirement:

\b(?=[MBDPI]{3}\b)\w*I\w*

with the word boundary checks to ensure it's only three characters long.

This is a bit more of an advanced solution and applicable in more situations but I'd generally favour what's easier to read (being the "or" version imho).

like image 22
cletus Avatar answered Sep 17 '22 06:09

cletus