I receive the response from a service as below. How to parse this into a Map
? I first thought of split at whitespace but it doesn't work as the value might contain spaces e.g. look at the value of SA key in the below response.
One option I thought of is to split at whitespace provided the previous character is a double quote. Not sure how to write the regex for this though.
TX="0000000000108000001830001" FI="" OS="8" CI="QU01SF1S2032" AW="SSS" SA="1525 Windward Concourse"
Parse at quotes. You could even use a regular expression to find each key/value pair, assuming each value is in quotes. My only question would be, what are the rules for if a value contains embedded quotes? (Are they escaped using '\' or such? Regardless, this is not currently accounted for in the below...)
For example:
(\w+)="([^"]*)"
This will even give you groups #1 and #2 that can be used to provide the key and the value, respectively.
Run this in a loop, using Java's Matcher.find()
method, until you find all of the pairs.
Sample code:
String input = "TX=\"0000000000108000001830001\" FI=\"\" OS=\"8\" CI=\"QU01SF1S2032\" AW=\"SSS\" SA=\"1525 Windward Concourse\"";
Pattern p = Pattern.compile("\\s*(\\w+)=\"([^\"]*)\"\\s*");
Matcher m = p.matcher(input);
while(m.find()){
System.out.println(m.group(1));
System.out.println(m.group(2));
}
Output:
TX
0000000000108000001830001
FI
OS
8
CI
QU01SF1S2032
AW
SSS
SA
1525 Windward Concourse
By the looks of the text it seems that it could be an XML. Is that so, or is that text the raw response of the service? If it is an XML you can parse it easily with Groovy's XmlSlurper:
def input = '<root TX="0000000000108000001830001" FI="" OS="8" CI="QU01SF1S2032" AW="SSS" SA="1525 Windward Concourse"></root>'
def xml = new XmlSlurper().parseText(input)
def map = xml.attributes()
The map
variable would be [CI:QU01SF1S2032, AW:SSS, TX:0000000000108000001830001, OS:8, FI:, SA:1525 Windward Concourse]
If it's not an XML, you may follow ziesemer's answer and use a regex. A groovier version of his answer that generates a Map
would be:
def input = 'TX="0000000000108000001830001" FI="" OS="8" CI="QU01SF1S2032" AW="SSS" SA="1525 Windward Concourse"'
def match = input =~ /(\w+)="([^"]*)"/
def map = [:]
match.each {
map[it[1]] = it[2]
}
The result of map
would be the same as before.
StreamTokenizer
is fast, although I haven't used the quoteChar()
feature. Examples may be found here, here and here.
Console:
TX=0000000000108000001830001 FI= OS=8 CI=QU01SF1S2032 AW=SSS SA=1525 Windward Concourse Count: 6 0.623 ms
Code:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.StreamTokenizer;
import java.io.StringReader;
/** @see https://stackoverflow.com/questions/8867325 */
public class TokenizerTest {
private static final String s = ""
+ "TX=\"0000000000108000001830001\" FI=\"\" OS=\"8\" "
+ "CI=\"QU01SF1S2032\" AW=\"SSS\" SA=\"1525 Windward Concourse\"";
private static final char equal = '=';
private static final char quote = '"';
private static StreamTokenizer tokens = new StreamTokenizer(
new BufferedReader(new StringReader(s)));
public static void main(String[] args) {
long start = System.nanoTime();
tokenize();
long stop = System.nanoTime();
System.out.println((stop - start) / 1000000d + " ms");
}
private static void tokenize() {
tokens.ordinaryChar(equal);
tokens.quoteChar(quote);
try {
int count = 0;
int token = tokens.nextToken();
while (token != StreamTokenizer.TT_EOF) {
if (token == StreamTokenizer.TT_WORD) {
System.out.print(tokens.sval);
count++;
}
if (token == equal) {
System.out.print(equal);
}
if (token == quote) {
System.out.println(tokens.sval);
}
token = tokens.nextToken();
}
System.out.println("Count: " + count);
} catch (IOException e) {
e.printStackTrace();
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With