Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex: Specified words in any order

I'm not good at regex, trying to make 2 regex.

Regex1:

All specified words in any order but nothing else. (repetition allowed).

Regex2:

All specified words in any order but nothing else. (repetition not allowed).

Words:

aaa, bbb, ccc

Strings:

aaa ccc bbb
aaa ccc
aaa bbb ddd ccc
bbb aaa bbb ccc

Regex1 evaluate above strings as:

true -> all word present in any order
false -> bbb is missing
false -> unknown word 'ddd'
false -> repetition not allowed

Regex2 evaluate above strings as:

true -> all word present in any order
false -> bbb is missing
false -> unknown word 'ddd'
true -> all word present in any order and repetition is allowed

My Attempt

/^(?=.*\baaa\b)(?=.*\bbbb\b)(?=.*\bccc\b).*$/

Asking for learning purpose so please elaborate it.

like image 567
shajji Avatar asked Mar 12 '19 07:03

shajji


2 Answers

Without repitition regex101

^(?:(aaa|bbb|ccc)(?!.*?\b\1) ?\b){3}$

And with repitition regex101

^(?=.*?\baaa)(?=.*?\bbbb)(?=.*?\bccc)(?:(aaa|bbb|ccc) ?\b)+$

Two more ideas. Regex explanation at regex101 on the right side.

like image 96
bobble bubble Avatar answered Sep 27 '22 20:09

bobble bubble


For Regex 1:

var re = /^(?=.*?\baaa\b)(?=.*?\bbbb\b)(?=.*?\bccc\b)\b(?:aaa|bbb|ccc)\b(?: +\b(?:aaa|bbb|ccc)\b)*$/;
var res = document.getElementById('result');
res.innerText += re.test('aaa ccc bbb');
res.innerText += ', ' + re.test('aaa ccc ddd');
res.innerText += ', ' + re.test('aaa ddd bbb');
res.innerText += ', ' + re.test('ccc bbb ccc');
<div id="result"></div>

Your code already does part of the trick. Your positive lookaheads check that all words appear somewhere, however not, that they are the only words present. To achieve this, I added the circumflex (^) at the beginning to detect the start of the string. Then, the non capturing group of \b(?:aaa|bbb|ccc)\b, to detect the first instance of any word. This is then followed by any number of words, preceded by at least one space (?:\s+\b(?:aaa|bbb|ccc)\b)*, basically the same pattern, but with the \s+ in front, and wrapped in a *. And then we need the string to end somewhere. This is done with the dollar sign $.

For Regex 2:

The basic strategy is the same. You would just check with a negative lookahead, that the matched string does not exist again:

//var re = /^(?=.*?\baaa\b)(?!.*?\baaa\b.*?\baaa\b)(?=.*?\bbbb\b)(?!.*?\bbbb\b.*?\bbbb\b)(?=.*?\bccc\b)(?!.*?\bccc\b.*?\bccc\b)\b(?:aaa|bbb|ccc)\b(?:\s+\b(?:aaa|bbb|ccc)\b)*$/;
// optimized version, see comments
var re = /^(?=.*?\baaa\b)(?=.*?\bbbb\b)(?=.*?\bccc\b)(?!.*?\b(\w+)\b.*?\b\1\b)\b(?:aaa|bbb|ccc)\b(?: +\b(?:aaa|bbb|ccc)\b)*$/;
var res = document.getElementById('result');
res.innerText += re.test('aaa ccc bbb');
res.innerText += ', ' + re.test('aaa ccc ddd');
res.innerText += ', ' + re.test('aaa bbb aaa');
res.innerText += ', ' + re.test('aaa ccc bbb ccc');
<div id="result"></div>

First, we have the positive lookahead (?=.*?\bword\b) to see that word exists. We follow that by the negative lookahead (?!.*?\baaa\b.*?\baaa\b) to see, the word does not exist multiple times. Repeat for all words. Presto!

Update: Instead of checking the specific words aren't repeated, we can also check that NO word is repeated by using the (?!.*?\b(\w+)\b.*?\b\1\b) construct. This makes the regex more concise. Thanks to @revo for pointing it out.

like image 24
Christoph Herold Avatar answered Sep 27 '22 22:09

Christoph Herold