Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to split nested coordinate strings

Tags:

java

regex

I have a String of the format "[(1, 2), (2, 3), (3, 4)]", with an arbitrary number of elements. I'm trying to split it on the commas separating the coordinates, that is, to retrieve (1, 2), (2, 3), and (3, 4).

Can I do it in Java regex? I'm a complete noob but hoping Java regex is powerful enough for it. If it isn't, could you suggest an alternative?

like image 631
Humphrey Bogart Avatar asked Feb 01 '10 21:02

Humphrey Bogart


2 Answers

From Java 5

Scanner sc = new Scanner();
sc.useDelimiter("\\D+"); // skip everything that is not a digit
List<Coord> result = new ArrayList<Coord>();
while (sc.hasNextInt()) {
    result.add(new Coord(sc.nextInt(), sc.nextInt()));
}
return result;

EDIT: We don't know how much coordinates are passed in the string coords.

like image 155
Hubert Avatar answered Oct 31 '22 10:10

Hubert


You can use String#split() for this.

String string = "[(1, 2), (2, 3), (3, 4)]";
string = string.substring(1, string.length() - 1); // Get rid of braces.
String[] parts = string.split("(?<=\\))(,\\s*)(?=\\()");
for (String part : parts) {
    part = part.substring(1, part.length() - 1); // Get rid of parentheses.
    String[] coords = part.split(",\\s*");
    int x = Integer.parseInt(coords[0]);
    int y = Integer.parseInt(coords[1]);
    System.out.printf("x=%d, y=%d\n", x, y);
}

The (?<=\\)) positive lookbehind means that it must be preceded by ). The (?=\\() positive lookahead means that it must be suceeded by (. The (,\\s*) means that it must be splitted on the , and any space after that. The \\ are here just to escape regex-specific chars.

That said, the particular String is recognizeable as outcome of List#toString(). Are you sure you're doing things the right way? ;)

Update as per the comments, you can indeed also do the other way round and get rid of non-digits:

String string = "[(1, 2), (2, 3), (3, 4)]";
String[] parts = string.split("\\D.");
for (int i = 1; i < parts.length; i += 3) {
    int x = Integer.parseInt(parts[i]);
    int y = Integer.parseInt(parts[i + 1]);
    System.out.printf("x=%d, y=%d\n", x, y);
}

Here the \\D means that it must be splitted on any non-digit (the \\d stands for digit). The . after means that it should eliminate any blank matches after the digits. I must however admit that I'm not sure how to eliminate blank matches before the digits. I'm not a trained regex guru yet. Hey, Bart K, can you do it better?

After all, it's ultimately better to use a parser for this. See Huberts answer on this topic.

like image 31
BalusC Avatar answered Oct 31 '22 11:10

BalusC