I'm trying to capture key-value pairs from strings that have the following form:
a0=d235 a1=2314 com1="abcd" com2="a b c d"
Using help from this post, I was able to write the following regex that captures the key-value pairs:
Pattern.compile("(\\w*)=(\"[^\"]*\"|[^\\s]*)");
The problem is that the second group in this pattern also captures the quotation marks, as follows:
a0=d235
a1=2314
com1="abcd"
com2="a b c d"
How do I exclude the quotation marks? I want something like this:
a0=d235
a1=2314
com1=abcd
com2=a b c d
EDIT:
It is possible to achieve the above by capturing the value in different groups depending on whether there are quotation marks or not. I'm writing this code for a parser so for performance reasons I'm trying to come up with a regex that can return the value in the same group number.
How about this? The idea is to split the last group into 2 groups.
Pattern p = Pattern.compile("(\\w+)=\"([^\"]+)\"|([^\\s]+)");
String test = "a0=d235 a1=2314 com1=\"abcd\" com2=\"a b c d\"";
Matcher m = p.matcher(test);
while(m.find()){
System.out.print(m.group(1));
System.out.print("=");
System.out.print(m.group(2) == null ? m.group(3):m.group(2));
System.out.println();
}
Update
Here is a new solution in response to the updated question. This regex applies positive look-ahead and look-behind to make sure there is a quote without actually parsing it. This way, groups 2 and 3 above, can be put in the same group (group 2 below). There is no way to exclude the quotes by while returning group 0.
Pattern p = Pattern.compile("(\\w+)=\"*((?<=\")[^\"]+(?=\")|([^\\s]+))\"*");
String test = "a0=d235 a1=2314 com1=\"abcd\" com2=\"a b c d\"";
Matcher m = p.matcher(test);
while(m.find()){
print m.group(1);
print "="
println m.group(2);
}
Output
a0=d235
a1=2314
com1=abcd
com2=a b c d
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With