Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How could I find all whitespaces excluding the ones between quotes?

Tags:

regex

php

I need to split string by spaces, but phrase in quotes should be preserved unsplitted. Example:

  word1 word2 "this is a phrase" word3 word4 "this is a second phrase" word5

this should result in array after preg_split:

array(
 [0] => 'word1',
 [1] => 'word2',
 [2] => 'this is a phrase',
 [3] => 'word3',
 [4] => 'word4',
 [5] => 'this is a second phrase',
 [6]  => 'word5'
)

How should I compose my regexp to do that?

PS. There is related question, but I don't think it works in my case. Accepted answer provides regexp to find words instead of whitespaces.

like image 691
altern Avatar asked Nov 12 '09 12:11

altern


1 Answers

With the help of user MizardX from #regex irc channel (irc.freenode.net) solution was found. It even supports single quotes.

$str= 'word1 word2 \'this is a phrase\' word3 word4 "this is a second phrase" word5 word1 word2 "this is a phrase" word3 word4 "this is a second phrase" word5';

$regexp = '/\G(?:"[^"]*"|\'[^\']*\'|[^"\'\s]+)*\K\s+/';

$arr = preg_split($regexp, $str);

print_r($arr);

Result is:

Array (
    [0] => word1
    [1] => word2
    [2] => 'this is a phrase'
    [3] => word3
    [4] => word4
    [5] => "this is a second phrase"
    [6] => word5
    [7] => word1
    [8] => word2
    [9] => "this is a phrase"
    [10] => word3
    [11] => word4
    [12] => "this is a second phrase"
    [13] => word5  
)

PS. Only disadvantage is that this regexp works only for PCRE 7.

It turned out that I do not have PCRE 7 support on production server, only PCRE 6 is installed there. Even though it is not as flexible as previous one for PCRE 7, regexp that will work is (got rid of \G and \K):

/(?:"[^"]*"|\'[^\']*\'|[^"\'\s]+)+/

For the given input result is the same as above.

like image 167
altern Avatar answered Nov 14 '22 14:11

altern