Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match ":)" smiley followed by word boundary

I am trying to match smileys followed by a word boundary \b.

Let's say I wanna match :p and :) followed by \b.

/(:p)\b/ is working fine but why is /(:\))\b/ behaving the opposite?

like image 578
httpete Avatar asked Apr 27 '15 09:04

httpete


People also ask

What is a word boundary in regex?

A word boundary is a zero-width test between two characters. To pass the test, there must be a word character on one side, and a non-word character on the other side. It does not matter which side each character appears on, but there must be one of each.

What characters are word boundaries in regex?

The metacharacter \b is an anchor like the caret and the dollar sign. It matches at a position that is called a “word boundary”.

What is word boundary in regex Java?

A word boundary, in most regex dialects, is a position between \w and \W (non-word char), or at the beginning or end of a string if it begins or ends (respectively) with a word character ( [0-9A-Za-z_] ). So, in the string "-12" , it would match before the 1 or after the 2. The dash is not a word character.

What does \b mean in regex?

The \b metacharacter matches at the beginning or end of a word.


2 Answers

You cannot use a word boundary here as ) is a non-word character.

Simply put: \b allows you to perform a whole words only search using a regular expression in the form of \bword\b. A word character is a character that can be used to form words. All characters that are not word characters are non-word characters.

Use (:\)) to match :) and capture it in the first capturing group.

Use /(:\))(?![a-z0-9_])/i in order to avoid matching any :)s with letters after the smiley. It is an equivalent of (:\))\B.

\B is the negated version of \b. \B matches at every position where \b does not. Effectively, \B matches at any position between two word characters as well as at any position between two non-word characters.

See demo 1 and demo 2.

like image 119
Wiktor Stribiżew Avatar answered Sep 28 '22 03:09

Wiktor Stribiżew


Addition to stribizhev's answer.. you can use (:\))\B

Examples for when to use what:

\b : string = That man is batman. regex = \bman\b matches only man and not the man in batman because position between tm is not a word boundary (it is a word).

\B : string = I am bat-man and he is super - man. regex = \B-\B matches - in super - man whereas \b-\b matches - in bat-man since position between t- and -m are word boundaries.. and (space) -, - (space) is not.

Note: It is easy to understand if you consider \b or \B as a position between two characters and if the transition from character to character is word to word or word to non word

like image 45
karthik manchala Avatar answered Sep 28 '22 03:09

karthik manchala