I'm trying to complete a PHP app in the next 2 weeks and I just can't figure out the regular expression to parse some attribute strings.
I get random strings that are in the format of like this string:
KeyName1="KeyValue1" KeyName2='KeyValue2'
There may be any number of key value pairs in a single string and the values can be delimited by either single quotes ' or double quotes " in any combination within one string (but they are always delimited).
The key values can be of any lengths and contain any character except double quotes can't be inside double quotes and a single quotes can't be inside single quotes, but double quotes can be inside single quotes and single quotes can be inside double quotes.
The key value pairs can have any number of spaces between them and any number of spaces between the key name and the equal sign and the equal sign and the quote character that starts the key value.
I need to turn the string into an array that looks like:
$arrayName["KeyName1"] = "KeyValue1"
$arrayName["KeyName2"] = "KeyValue2"
etc.
I'm pretty sure it can be done with regular expressions but all my attempts have failed and I need some help (actually lots of help :-) to get this done and am hoping some of the amazing people here can provide that help or at least get me started.
Sure, no problem. Let's break it down:
\w+\s*=\s*
matches an alphanumeric keyword, followed by an equals sign (which might be surrounded by whitespace).
"[^"]*"
matches an opening double quote, followed by any number of characters except another double quote, then a (closing) double quote.
'[^']*'
does the same for single quoted strings.
Combining that using capturing groups ((...)
) with a simple alternation (|
) gives you
(\w+)\s*=\s*("[^"]*"|'[^']*')
In PHP:
preg_match_all('/(\w+)\s*=\s*("[^"]*"|\'[^\']*\')/', $subject, $result, PREG_SET_ORDER);
fills $result
with an array of matches. $result[n]
will contain the details of the n
th match, where
$result[n][0]
is the entire match$result[n][1]
contains the keyword$result[n][2]
contains the value (including quotes)Edit:
To match the value part without its quotes, regardless of the kind of quotes that are used, you need a slightly more complicated regex that uses a positive lookahead assertion:
(\w+)\s*=\s*(["'])((?:(?!\2).)*)\2
In PHP:
preg_match_all('/(\w+)\s*=\s*(["\'])((?:(?!\2).)*)\2/', $subject, $result, PREG_SET_ORDER);
with the results
$result[n][0]
: entire match$result[n][1]
: keyword$result[n][2]
: quote character$result[n][3]
: valueExplanation:
(["']) # Match a quote (--> group 2)
( # Match and capture --> group 3...
(?: # the following regex:
(?!\2) # As long as the next character isn't the one in group 2,
. # match it (any character)
)* # any number of times.
) # End of capturing group 3
\2 # Then match the corresponding quote character.
A little variant from Tim Pietzcker way:
preg_match_all('/(\w+)\s*=\s*(?|"([^"]*)"|\'([^\']*)\')/', $subject, $result, PREG_SET_ORDER);
Then you have $result[n][2]
that contains the value without quotes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With