Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to check if two first words are same

Tags:

regex

php

For example:

$s1 = "Test Test the rest of string"
$s2 = "Test the rest of string"

I would like to match positively $s1 but not $s2, because first word in $s1 is the same as second. Word 'Test' is example, regular expression should work on any words.

like image 505
user594791 Avatar asked Jan 30 '11 15:01

user594791


People also ask

How do I match a word in a regular expression?

To run a “whole words only” search using a regular expression, simply place the word between two word boundaries, as we did with ‹ \bcat\b ›. The first ‹ \b › requires the ‹ c › to occur at the very start of the string, or after a nonword character.

What does ?= Mean in regex?

?= is a positive lookahead, a type of zero-width assertion. What it's saying is that the captured match must be followed by whatever is within the parentheses but that part isn't captured. Your example means the match needs to be followed by zero or more characters and then a digit (but again that part isn't captured).

How do you search for a regex pattern at the beginning of a string?

The meta character “^” matches the beginning of a particular string i.e. it matches the first character of the string. For example, The expression “^\d” matches the string/line starting with a digit. The expression “^[a-z]” matches the string/line starting with a lower case alphabet.

How do you search for multiple words in a regular expression?

However, to recognize multiple words in any order using regex, I'd suggest the use of quantifier in regex: (\b(james|jack)\b. *){2,} . Unlike lookaround or mode modifier, this works in most regex flavours.


2 Answers

if(preg_match('/^(\w+)\s+\1\b/',$input)) {
  // $input has same first two words.
}

Explanation:

^    : Start anchor
(    : Start of capturing group
 \w+ : A word
)    : End of capturing group
\s+  : One or more whitespace
\1   : Back reference to the first word
\b   : Word boundary
like image 195
codaddict Avatar answered Sep 24 '22 03:09

codaddict


~^(\w+)\s+\1(?:\W|$)~
~^(\pL+)\s+\1(?:\PL|$)~u // unicode variant

\1 is a back reference to the first capturing group.

like image 29
NikiC Avatar answered Sep 23 '22 03:09

NikiC