Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a simple way of parsing this text into a Map

I receive the response from a service as below. How to parse this into a Map? I first thought of split at whitespace but it doesn't work as the value might contain spaces e.g. look at the value of SA key in the below response.

One option I thought of is to split at whitespace provided the previous character is a double quote. Not sure how to write the regex for this though.

TX="0000000000108000001830001" FI="" OS="8" CI="QU01SF1S2032" AW="SSS" SA="1525 Windward Concourse"

like image 777
Aravind Yarram Avatar asked Jan 15 '12 03:01

Aravind Yarram


3 Answers

Parse at quotes. You could even use a regular expression to find each key/value pair, assuming each value is in quotes. My only question would be, what are the rules for if a value contains embedded quotes? (Are they escaped using '\' or such? Regardless, this is not currently accounted for in the below...)

For example:

(\w+)="([^"]*)"

This will even give you groups #1 and #2 that can be used to provide the key and the value, respectively.

Run this in a loop, using Java's Matcher.find() method, until you find all of the pairs.

Sample code:

String input = "TX=\"0000000000108000001830001\" FI=\"\" OS=\"8\" CI=\"QU01SF1S2032\" AW=\"SSS\" SA=\"1525 Windward Concourse\"";

Pattern p = Pattern.compile("\\s*(\\w+)=\"([^\"]*)\"\\s*");

Matcher m = p.matcher(input);
while(m.find()){
    System.out.println(m.group(1));
    System.out.println(m.group(2));
}

Output:

TX
0000000000108000001830001
FI

OS
8
CI
QU01SF1S2032
AW
SSS
SA
1525 Windward Concourse
like image 149
ziesemer Avatar answered Sep 28 '22 04:09

ziesemer


By the looks of the text it seems that it could be an XML. Is that so, or is that text the raw response of the service? If it is an XML you can parse it easily with Groovy's XmlSlurper:

def input = '<root TX="0000000000108000001830001" FI="" OS="8" CI="QU01SF1S2032" AW="SSS" SA="1525 Windward Concourse"></root>'
def xml = new XmlSlurper().parseText(input)

def map = xml.attributes()

The map variable would be [CI:QU01SF1S2032, AW:SSS, TX:0000000000108000001830001, OS:8, FI:, SA:1525 Windward Concourse]

If it's not an XML, you may follow ziesemer's answer and use a regex. A groovier version of his answer that generates a Map would be:

def input = 'TX="0000000000108000001830001" FI="" OS="8" CI="QU01SF1S2032" AW="SSS" SA="1525 Windward Concourse"'
def match = input =~ /(\w+)="([^"]*)"/

def map = [:]
match.each {
    map[it[1]] = it[2]
}

The result of map would be the same as before.

like image 42
epidemian Avatar answered Sep 28 '22 05:09

epidemian


StreamTokenizer is fast, although I haven't used the quoteChar() feature. Examples may be found here, here and here.

Console:

TX=0000000000108000001830001
FI=
OS=8
CI=QU01SF1S2032
AW=SSS
SA=1525 Windward Concourse
Count: 6
0.623 ms

Code:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.StreamTokenizer;
import java.io.StringReader;

/** @see https://stackoverflow.com/questions/8867325 */
public class TokenizerTest {

    private static final String s = ""
        + "TX=\"0000000000108000001830001\" FI=\"\" OS=\"8\" "
        + "CI=\"QU01SF1S2032\" AW=\"SSS\" SA=\"1525 Windward Concourse\"";
    private static final char equal = '=';
    private static final char quote = '"';
    private static StreamTokenizer tokens = new StreamTokenizer(
        new BufferedReader(new StringReader(s)));

    public static void main(String[] args) {
        long start = System.nanoTime();
        tokenize();
        long stop = System.nanoTime();
        System.out.println((stop - start) / 1000000d + " ms");
    }

    private static void tokenize() {
        tokens.ordinaryChar(equal);
        tokens.quoteChar(quote);
        try {
            int count = 0;
            int token = tokens.nextToken();
            while (token != StreamTokenizer.TT_EOF) {
                if (token == StreamTokenizer.TT_WORD) {
                    System.out.print(tokens.sval);
                    count++;
                }
                if (token == equal) {
                    System.out.print(equal);
                }
                if (token == quote) {
                    System.out.println(tokens.sval);
                }
                token = tokens.nextToken();
            }
            System.out.println("Count: " + count);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}
like image 45
trashgod Avatar answered Sep 28 '22 05:09

trashgod