Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to match string containing two names in any order

Tags:

string

regex

People also ask

How do you search for multiple words in a regular expression?

However, to recognize multiple words in any order using regex, I'd suggest the use of quantifier in regex: (\b(james|jack)\b. *){2,} . Unlike lookaround or mode modifier, this works in most regex flavours.

What does regex (? S match?

i) makes the regex case insensitive. (? s) for "single line mode" makes the dot match all characters, including line breaks.

What does \\ mean in regex?

\\. matches the literal character . . the first backslash is interpreted as an escape character by the Emacs string reader, which combined with the second backslash, inserts a literal backslash character into the string being read. the regular expression engine receives the string \.

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1.


You can do checks using positive lookaheads. Here is a summary from the indispensable regular-expressions.info:

Lookahead and lookbehind, collectively called “lookaround”, are zero-length assertions...lookaround actually matches characters, but then gives up the match, returning only the result: match or no match. That is why they are called “assertions”. They do not consume characters in the string, but only assert whether a match is possible or not.

It then goes on to explain that positive lookaheads are used to assert that what follows matches a certain expression without taking up characters in that matching expression.

So here is an expression using two subsequent postive lookaheads to assert that the phrase matches jack and james in either order:

^(?=.*\bjack\b)(?=.*\bjames\b).*$

Test it.

The expressions in parentheses starting with ?= are the positive lookaheads. I'll break down the pattern:

  1. ^ asserts the start of the expression to be matched.
  2. (?=.*\bjack\b) is the first positive lookahead saying that what follows must match .*\bjack\b.
  3. .* means any character zero or more times.
  4. \b means any word boundary (white space, start of expression, end of expression, etc.).
  5. jack is literally those four characters in a row (the same for james in the next positive lookahead).
  6. $ asserts the end of the expression to me matched.

So the first lookahead says "what follows (and is not itself a lookahead or lookbehind) must be an expression that starts with zero or more of any characters followed by a word boundary and then jack and another word boundary," and the second look ahead says "what follows must be an expression that starts with zero or more of any characters followed by a word boundary and then james and another word boundary." After the two lookaheads is .* which simply matches any characters zero or more times and $ which matches the end of the expression.

"start with anything then jack or james then end with anything" satisfies the first lookahead because there are a number of characters then the word jack, and it satisfies the second lookahead because there are a number of characters (which just so happens to include jack, but that is not necessary to satisfy the second lookahead) then the word james. Neither lookahead asserts the end of the expression, so the .* that follows can go beyond what satisfies the lookaheads, such as "then end with anything".

I think you get the idea, but just to be absolutely clear, here is with jack and james reversed, i.e. "start with anything then james or jack then end with anything"; it satisfies the first lookahead because there are a number of characters then the word james, and it satisfies the second lookahead because there are a number of characters (which just so happens to include james, but that is not necessary to satisfy the second lookahead) then the word jack. As before, neither lookahead asserts the end of the expression, so the .* that follows can go beyond what satisfies the lookaheads, such as "then end with anything".

This approach has the advantage that you can easily specify multiple conditions.

^(?=.*\bjack\b)(?=.*\bjames\b)(?=.*\bjason\b)(?=.*\bjules\b).*$

Try:

james.*jack

If you want both at the same time, then or them:

james.*jack|jack.*james

Explanation of command that i am going to write:-

. means any character, digit can come in place of .

* means zero or more occurrences of thing written just previous to it.

| means 'or'.

So,

james.*jack

would search james , then any number of character until jack comes.

Since you want either jack.*james or james.*jack

Hence Command:

jack.*james|james.*jack

Its short and sweet

(?=.*jack)(?=.*james)

Test Cases:

[
  "xxx james xxx jack xxx",
  "jack xxx james ",
  "jack xxx jam ",
  "  jam and jack",
  "jack",
  "james",
]
.forEach(s => console.log(/(?=.*james)(?=.*jack)/.test(s)) )

You can do:

\bjack\b.*\bjames\b|\bjames\b.*\bjack\b