Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PCRE matching whole words in a string

Tags:

php

pcre

I'm trying to run a regexp in php (preg_match_all) that matches certain whole words in a string, but problem is that it also matches words that contain only part of a tested word. Also this is a sub-query in a larger regexp, so other PHP functions like strpos won't help me, sadly.

String: "I test a string"

Words to match: "testable", "string"

Tried regexp: /([testable|string]+)/

Expected result: "string" only!

Result: "test", "a", "string"

like image 512
Inoryy Avatar asked Oct 05 '11 14:10

Inoryy


People also ask

How do you match a whole word in Python?

\w -- (lowercase w) matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_]. Note that although "word" is the mnemonic for this, it only matches a single word char, not a whole word. \W (upper case W) matches any non-word character. \b -- boundary between word and non-word.

Which regex matches the whole words dog or cat?

If we want to improve the first example to match whole words only, we would need to use \b(cat|dog)\b. This tells the regex engine to find a word boundary, then either cat or dog, and then another word boundary.

How do you match words in regex?

To run a “whole words only” search using a regular expression, simply place the word between two word boundaries, as we did with ‹ \bcat\b ›. The first ‹ \b › requires the ‹ c › to occur at the very start of the string, or after a nonword character.

Does * match everything in regex?

Throw in an * (asterisk), and it will match everything. Read more. \s (whitespace metacharacter) will match any whitespace character (space; tab; line break; ...), and \S (opposite of \s ) will match anything that is not a whitespace character.


2 Answers

If you really want to make sure you only get your words and not words that contain them, then you can use word boundary anchors:

/\b(testable|string)\b/

This will match only a word boundary followed by either testable or string and then another word boundary.

like image 63
Andrew Avatar answered Sep 22 '22 11:09

Andrew


You don't want a character class with [], you just want to match the words:

/testable|string/

like image 34
e.dan Avatar answered Sep 22 '22 11:09

e.dan