Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Meaning of a simple pattern of preg_replace (#\s+#)?

Tags:

php

Sorry for the very basic question, but there's simply no easy way to search for a string like that nor here neither in Google or SymbolHound. Also haven't found an answer in PHP Manual (Pattern Syntax & preg_replace).

This code is inside a function that receives the $content and $length parameters.
What does that preg_replace serves for?

$the_string = preg_replace('#\s+#', ' ', $content);
$words = explode(' ', $the_string);

if( count($words) <= $length ) 

Also, would it be better to use str_word_count instead?

like image 888
brasofilo Avatar asked Jul 17 '12 11:07

brasofilo


3 Answers

This pattern replaces successive space characters (note, not just spaces, but also line breaks or tabs) with a single, conventional space (' '). \s+ says "match a sequence, made up of one or more space characters".

The # signs are delimiters for the pattern. Probably more common is to see patterns delimited by forward slashes. (Actually you can do REGEX in PHP without delimiters but doing so has implications on how the pattern is handled, which is beyond the scope of this question/answer).

http://php.net/manual/en/regexp.reference.delimiters.php

Relying on spaces to find words in a string is generally not the best approach - we can use the \b word boundary marker instead.

$sentence = "Hello, there. How are you today? Hope you're OK!";
preg_match_all('/\b[\w-]+\b/', $sentence, $words);

That says: grab all substrings within the greater string that are comprised of only alphanumeric characters or hyphens, and which are encased by a word boundary.

$words is now an array of words used in the sentence.

like image 189
Mitya Avatar answered Nov 01 '22 16:11

Mitya


# is delimiter

Often used delimiters are forward slashes (/), hash signs (#) and tildes (~). The following are all examples of valid delimited patterns.

$the_string = preg_replace('#\s+#', ' ', $content);

it will replace multiple space (\s) with single space

like image 22
diEcho Avatar answered Nov 01 '22 16:11

diEcho


\s+ is used to match multiple spaces. You are replacing them with a single space, using preg_replace('#\s+#', ' ', $content);

str_word_count might be suitable, but you might need to specify additional characters which count as words, or the function reports wrong values when using UTF-8 characters.

str_word_count($str, 1, characters_that_are_not_considered_word_boundaries);

EXAMPLE:

print_r(str_word_count('holóeóó what',1));

returns

Array ( [0] => hol [1] => e [2] => what )
like image 33
Anirudh Ramanathan Avatar answered Nov 01 '22 17:11

Anirudh Ramanathan