Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing a table using regex - Java

I'm parsing the following AWS cost instance table:

m1.small    1   1   1.7     1 x 160    $0.044 per Hour
m1.medium   1   2   3.75    1 x 410    $0.087 per Hour
m1.large    2   4   7.5     2 x 420    $0.175 per Hour
m1.xlarge   4   8   15      4 x 420    $0.35 per Hour

There's a file with those costs:

input = new Scanner(file);
String[] values;
while (input.hasNextLine()) {
    String line = input.nextLine();
    values = line.split("\\s+"); // <-- not what I want...
    for (String v : values)
        System.out.println(v);
}

However that gives me:

m1.small
1
1
1.7
1
x
160
$0.044
per
Hour

which is not what I want ... A corrected parsed values (with the right regex) would look like this:

['m1.small', '1', '1', '1.7', '1 x 160', '$0.044', 'per Hour']

What would be the right regex in order to obtain the right result? One can assume the table will have always the same pattern.

like image 874
cybertextron Avatar asked Dec 25 '15 03:12

cybertextron


2 Answers

Try this fiddle https://regex101.com/r/sP6zW5/1

([^\s]+)\s+(\d+)\s+(\d+)\s+([\d\.]+)\s+(\d+ x \d+)\s+(\$\d+\.\d+)\s+(per \w+)

match the text and the group is your list.

I think use split in your case is too complicated. If the text is always the same.Just like a reverse procedure of string formatting.

like image 85
amow Avatar answered Sep 24 '22 08:09

amow


If you want to use a regular expression, you'd do this:

        String s = "m1.small    1   1   1.7     1 x 160    $0.044 per Hour";
        String spaces = "\\s+";
        String type = "(.*?)";
        String intNumber = "(\\d+)";
        String doubleNumber = "([0-9.]+)";
        String dollarNumber = "([$0-9.]+)";
        String aXb = "(\\d+ x \\d+)";
        String rest = "(.*)";

        Pattern pattern = Pattern.compile(type + spaces + intNumber + spaces + intNumber + spaces + doubleNumber
                + spaces + aXb + spaces + dollarNumber + spaces + rest);
        Matcher matcher = pattern.matcher(s);
        while (matcher.find()) {
            String[] fields = new String[] { matcher.group(1), matcher.group(2), matcher.group(3), matcher.group(4),
                    matcher.group(5), matcher.group(6), matcher.group(7) };
            System.out.println(Arrays.toString(fields));
        }

Notice how I've broken up the regular expression to be readable. (As one long String, it is hard to read/maintain.) There's another way of doing it though. Since you know which fields are being split, you could just do this simple split and build a new array with the combined values:

        String[] allFields = s.split("\\s+");
        String[] result = new String[] { 
            allFields[0], 
            allFields[1],
            allFields[2],
            allFields[3],
            allFields[4] + " " + allFields[5] + " " + allFields[6],         
            allFields[7], 
            allFields[8] + " " + allFields[9] };
        System.out.println(Arrays.toString(result));
like image 45
Jeanne Boyarsky Avatar answered Sep 21 '22 08:09

Jeanne Boyarsky