Is there a simple way of parsing this text into a Map

Question

I receive the response from a service as below. How to parse this into a Map? I first thought of split at whitespace but it doesn't work as the value might contain spaces e.g. look at the value of SA key in the below response.

One option I thought of is to split at whitespace provided the previous character is a double quote. Not sure how to write the regex for this though.

TX="0000000000108000001830001" FI="" OS="8" CI="QU01SF1S2032" AW="SSS" SA="1525 Windward Concourse"

ziesemer · Accepted Answer

Parse at quotes. You could even use a regular expression to find each key/value pair, assuming each value is in quotes. My only question would be, what are the rules for if a value contains embedded quotes? (Are they escaped using '\' or such? Regardless, this is not currently accounted for in the below...)

For example:

(\w+)="([^"]*)"

This will even give you groups #1 and #2 that can be used to provide the key and the value, respectively.

Run this in a loop, using Java's Matcher.find() method, until you find all of the pairs.

Sample code:

String input = "TX=\"0000000000108000001830001\" FI=\"\" OS=\"8\" CI=\"QU01SF1S2032\" AW=\"SSS\" SA=\"1525 Windward Concourse\"";

Pattern p = Pattern.compile("\s*(\w+)=\"([^\"]*)\"\s*");

Matcher m = p.matcher(input);
while(m.find()){
    System.out.println(m.group(1));
    System.out.println(m.group(2));
}

Output:

TX
0000000000108000001830001
FI

OS
8
CI
QU01SF1S2032
AW
SSS
SA
1525 Windward Concourse

epidemian · Answer

By the looks of the text it seems that it could be an XML. Is that so, or is that text the raw response of the service? If it is an XML you can parse it easily with Groovy's XmlSlurper:

def input = '<root TX="0000000000108000001830001" FI="" OS="8" CI="QU01SF1S2032" AW="SSS" SA="1525 Windward Concourse"></root>'
def xml = new XmlSlurper().parseText(input)

def map = xml.attributes()

The map variable would be [CI:QU01SF1S2032, AW:SSS, TX:0000000000108000001830001, OS:8, FI:, SA:1525 Windward Concourse]

If it's not an XML, you may follow ziesemer's answer and use a regex. A groovier version of his answer that generates a Map would be:

def input = 'TX="0000000000108000001830001" FI="" OS="8" CI="QU01SF1S2032" AW="SSS" SA="1525 Windward Concourse"'
def match = input =~ /(\w+)="([^"]*)"/

def map = [:]
match.each {
    map[it[1]] = it[2]
}

The result of map would be the same as before.

trashgod · Answer

StreamTokenizer is fast, although I haven't used the quoteChar() feature. Examples may be found here, here and here.

Console:

TX=0000000000108000001830001
FI=
OS=8
CI=QU01SF1S2032
AW=SSS
SA=1525 Windward Concourse
Count: 6
0.623 ms

Code:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.StreamTokenizer;
import java.io.StringReader;

/** @see https://stackoverflow.com/questions/8867325 */
public class TokenizerTest {

    private static final String s = ""
        + "TX=\"0000000000108000001830001\" FI=\"\" OS=\"8\" "
        + "CI=\"QU01SF1S2032\" AW=\"SSS\" SA=\"1525 Windward Concourse\"";
    private static final char equal = '=';
    private static final char quote = '"';
    private static StreamTokenizer tokens = new StreamTokenizer(
        new BufferedReader(new StringReader(s)));

    public static void main(String[] args) {
        long start = System.nanoTime();
        tokenize();
        long stop = System.nanoTime();
        System.out.println((stop - start) / 1000000d + " ms");
    }

    private static void tokenize() {
        tokens.ordinaryChar(equal);
        tokens.quoteChar(quote);
        try {
            int count = 0;
            int token = tokens.nextToken();
            while (token != StreamTokenizer.TT_EOF) {
                if (token == StreamTokenizer.TT_WORD) {
                    System.out.print(tokens.sval);
                    count++;
                }
                if (token == equal) {
                    System.out.print(equal);
                }
                if (token == quote) {
                    System.out.println(tokens.sval);
                }
                token = tokens.nextToken();
            }
            System.out.println("Count: " + count);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Is there a simple way of parsing this text into a Map

Tags:

java

regex

algorithm

parsing

groovy

Aravind Yarram

3 Answers

ziesemer

epidemian

trashgod

Recent Activity

Donate For Us

Is there a simple way of parsing this text into a Map

Tags:

java

regex

algorithm

parsing

groovy

Aravind Yarram

3 Answers

ziesemer

epidemian

trashgod

Related questions

Recent Activity

Donate For Us