Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression for parsing name value pairs

Tags:

regex

Can someone provide a regular expression for parsing name/value pairs from a string? The pairs are separated by commas, and the value can optionally be enclosed in quotes. For example:

AssemblyName=foo.dll,ClassName="SomeClass",Parameters="Some,Parameters"
like image 626
Chris Karcher Avatar asked Oct 03 '08 18:10

Chris Karcher


People also ask

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .

What does (? I do in regex?

(? i) makes the regex case insensitive. (? c) makes the regex case sensitive.


2 Answers

  • No escape:

    /([^=,]*)=("[^"]*"|[^,"]*)/
    
  • Double quote escape for both key and value:

    /((?:"[^"]*"|[^=,])*)=((?:"[^"]*"|[^=,])*)/
    
    key=value,"key with "" in it"="value with "" in it",key=value" "with" "spaces
    
  • Backslash string escape:

    /([^=,]*)=("(?:\\.|[^"\\]+)*"|[^,"]*)/
    
    key=value,key="value",key="val\"ue"
    
  • Full backslash escape:

    /((?:\\.|[^=,]+)*)=("(?:\\.|[^"\\]+)*"|(?:\\.|[^,"\\]+)*)/
    
    key=value,key="value",key="val\"ue",ke\,y=val\,ue
    

Edit: Added escaping alternatives.

Edit2: Added another escaping alternative.

You would have to clean up the keys/values by removing any escape-characters and surrounding quotes.

like image 132
Markus Jarderot Avatar answered Nov 01 '22 21:11

Markus Jarderot


Nice answer from MizardX. Minor niggles - it doesn't allow for spaces around names etc (which may not matter), and it collects the quotes as well as the quoted value (which also may not matter), and it doesn't have an escape mechanism for embedding double quote characters in the quoted value (which, once more, may not matter).

As written, the pattern works with most of the extended regular expression systems. Fixing the niggles would probably require descent into, say, Perl. This version uses doubled quotes to escape -- hence a="a""b" generates a field value 'a""b' (which ain't perfect, but could be fixed afterwards easily enough):

/\s*([^=,\s]+)\s*=\s*(?:"((?:[^"]|"")*)"|([^,"]*))\s*,?/

Further, you'd have to use $2 or $3 to collect the value, whereas with MizardX's answer, you simply use $2. So, it isn't as easy or nice, but it covers a few edge cases. If the simpler answer is adequate, use it.

Test script:

#!/bin/perl -w

use strict;
my $qr = qr/\s*([^=,\s]+)\s*=\s*(?:"((?:[^"]|"")*)"|([^,"]*))\s*,?/;

while (<>)
{
    while (m/$qr/)
    {
        print "1= $1, 2 = $2, 3 = $3\n";
        $_ =~ s/$qr//;
    }
}

This witters about either $2 or $3 being undefined - accurately.

like image 45
Jonathan Leffler Avatar answered Nov 01 '22 23:11

Jonathan Leffler