i need to parse a search string for keywords and phrases in php, for example
string 1: value of "measured response" detect goal "method valuation" study
will yield: value,of,measured reponse,detect,goal,method valuation,study
i also need it to work if the string has:
i'm leaning towards using preg_match
with the pattern '/(\".*\")/'
to get the phrases into an array, then remove the phrases from the string, then finally work the keywords into the array. i just can't pull everything together!
i'm also thinking of replacing spaces outside quotes with commas. then explode them to an array. if that's a better option, how do i do that with preg_replace
?
is there a better way to go about this? help! thanks much, everyone
preg_match_all('/(?<!")\b\w+\b|(?<=")\b[^"]+/', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
# Matched text = $result[0][$i];
}
This should yield the results you are looking for.
Explanation :
# (?<!")\b\w+\b|(?<=")\b[^"]+
#
# Match either the regular expression below (attempting the next alternative only if this one fails) «(?<!")\b\w+\b»
# Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!")»
# Match the character “"” literally «"»
# Assert position at a word boundary «\b»
# Match a single character that is a “word character” (letters, digits, etc.) «\w+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# Assert position at a word boundary «\b»
# Or match regular expression number 2 below (the entire match attempt fails if this one fails to match) «(?<=")\b[^"]+»
# Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=")»
# Match the character “"” literally «"»
# Assert position at a word boundary «\b»
# Match any character that is NOT a “"” «[^"]+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
There is no need to use a regular expression, the built in function str_getcsv
can be used to explode a string with any given delimiter, enclosure and escape characters.
Really it is as simple as.
// where $string is the string to parse
$array = str_getcsv($string, ' ', '"');
$s = 'value of "measured response" detect goal "method valuation" study';
preg_match_all('~(?|"([^"]+)"|(\S+))~', $s, $matches);
print_r($matches[1]);
output:
Array
(
[0] => value
[1] => of
[2] => measured response
[3] => detect
[4] => goal
[5] => method valuation
[6] => study
)
The trick here is to use a branch-reset group: (?|...|...)
. It's just like an alternation contained in a non-capturing group - (?:...|...)
- except that within each branch the capturing-group numbers start at the same number. (For more info, see the PCRE docs and search for DUPLICATE SUBPATTERN NUMBERS
.)
Thus, the text we're interested in is always captured group #1. You can retrieve the contents of group #1 for all matches via $matches[1]
. (That's assuming the PREG_PATTERN_ORDER flag is set; I didn't specify it like @FailedDev did because it's the default. See the PHP docs for details.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With