Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to match key-value pairs where value is in quotes or apostrophes

Tags:

regex

php

I'm trying to complete a PHP app in the next 2 weeks and I just can't figure out the regular expression to parse some attribute strings.

I get random strings that are in the format of like this string:

KeyName1="KeyValue1" KeyName2='KeyValue2'

There may be any number of key value pairs in a single string and the values can be delimited by either single quotes ' or double quotes " in any combination within one string (but they are always delimited).

The key values can be of any lengths and contain any character except double quotes can't be inside double quotes and a single quotes can't be inside single quotes, but double quotes can be inside single quotes and single quotes can be inside double quotes.

The key value pairs can have any number of spaces between them and any number of spaces between the key name and the equal sign and the equal sign and the quote character that starts the key value.

I need to turn the string into an array that looks like:

$arrayName["KeyName1"] = "KeyValue1"
$arrayName["KeyName2"] = "KeyValue2"

etc.

I'm pretty sure it can be done with regular expressions but all my attempts have failed and I need some help (actually lots of help :-) to get this done and am hoping some of the amazing people here can provide that help or at least get me started.

like image 311
David Husnian Avatar asked Dec 27 '22 02:12

David Husnian


2 Answers

Sure, no problem. Let's break it down:

\w+\s*=\s*

matches an alphanumeric keyword, followed by an equals sign (which might be surrounded by whitespace).

"[^"]*"

matches an opening double quote, followed by any number of characters except another double quote, then a (closing) double quote.

'[^']*'

does the same for single quoted strings.

Combining that using capturing groups ((...)) with a simple alternation (|) gives you

(\w+)\s*=\s*("[^"]*"|'[^']*')

In PHP:

preg_match_all('/(\w+)\s*=\s*("[^"]*"|\'[^\']*\')/', $subject, $result, PREG_SET_ORDER);

fills $result with an array of matches. $result[n] will contain the details of the nth match, where

  • $result[n][0] is the entire match
  • $result[n][1] contains the keyword
  • $result[n][2] contains the value (including quotes)

Edit:

To match the value part without its quotes, regardless of the kind of quotes that are used, you need a slightly more complicated regex that uses a positive lookahead assertion:

(\w+)\s*=\s*(["'])((?:(?!\2).)*)\2

In PHP:

preg_match_all('/(\w+)\s*=\s*(["\'])((?:(?!\2).)*)\2/', $subject, $result, PREG_SET_ORDER);

with the results

  • $result[n][0]: entire match
  • $result[n][1]: keyword
  • $result[n][2]: quote character
  • $result[n][3]: value

Explanation:

(["'])    # Match a quote (--> group 2)
(         # Match and capture --> group 3...
 (?:      # the following regex:
  (?!\2)  # As long as the next character isn't the one in group 2,
  .       # match it (any character)
 )*       # any number of times.
)         # End of capturing group 3
\2        # Then match the corresponding quote character.
like image 193
Tim Pietzcker Avatar answered Feb 09 '23 20:02

Tim Pietzcker


A little variant from Tim Pietzcker way:

preg_match_all('/(\w+)\s*=\s*(?|"([^"]*)"|\'([^\']*)\')/', $subject, $result, PREG_SET_ORDER);

Then you have $result[n][2] that contains the value without quotes.

like image 38
Casimir et Hippolyte Avatar answered Feb 09 '23 19:02

Casimir et Hippolyte