Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex to capture groups in traceroute with Java8

Tags:

java

regex

java-8

I am trying to parse traceroute results in Java8 using Regex.

I am using the below regex to identify the groups.

^(\\d*).*[AS(\\d*)]?\\s+([\\w+\\.]+)\\s+\\(([\\d+\\.]+)\\)[\\s+(\\d+\\.\\d+)\\s+ms]+

Some example lines that I need to parse are:

1  10.33.128.1 (10.33.128.1)  4.452 ms  3.459 ms  3.474 ms  
6  * [AS3356] 4.68.72.218 (4.68.72.218)  12.432 ms  11.819 ms  
 * 4.68.72.218 (4.68.72.218)  12.432 ms  11.819 ms  
  61.182.180.62 (61.182.180.62) 175.300 ms  203.001 ms

And I want to extract the hop number (if available), ASN (if available), hostname, IP and time

but with the above regex, it matches the strings 1,2, and 4 which is what I want but only gives me hop, host and ASN.

My code is like this:

Pattern hop_pattern = Pattern.compile(
        "^(\\d*).*[AS(\\d*)]?\\s+([\\w+\\.]+)\\s+\\(([\\d+\\.]+)\\)[\\s+(\\d+\\.\\d+)\\s+ms]+")
Matcher m = hop_pattern.matcher(target);

while(m.find()) {
    System.out.println("count: " + m.groupCount());
    for(int i = 1; i < m.groupCount() + 1; i++) {
        System.out.println(i + "->" + m.group(i));
    }
}

Some example lines that I need to parse are:

1 10.33.128.1 (10.33.128.1) 4.452 ms 3.459 ms 3.474 ms
6 * [AS3356] 4.68.72.218 (4.68.72.218) 12.432 ms 11.819 ms
* 4.68.72.218 (4.68.72.218) 12.432 ms 11.819 ms
61.182.180.62 (61.182.180.62) 175.300 ms 203.001 ms

And I want to extract the hop number (if available), ASN (if available), hostname, IP and time

but with the above regex, it matches the strings 1,2, and 4 which is what I want but only gives me hop, host and ASN.

My code is like this:

    Pattern hop_pattern = Pattern.compile(
            "^(\\d*).*[AS(\\d*)]?\\s+([\\w+\\.]+)\\s+\\(([\\d+\\.]+)\\)[\\s+(\\d+\\.\\d+)\\s+ms]+")
    Matcher m = hop_pattern.matcher(target);

    while(m.find()) {
        System.out.println("count: " + m.groupCount());
        for(int i = 1; i < m.groupCount() + 1; i++) {
            System.out.println(i + "->" + m.group(i));
        }
    }

I am not sure if something is wrong with the code or the regex itself. Thanks for help!

Update: Some examples and sample output

1 [AS0] 10.200.200.200 (10.200.200.200) 37.526 ms 35.793 ms 37.728 ms
Expected Output: hop: 1 asn: 0 hostname: 10.200.200.200 ip: 10.200.200.200 time: [37.526, 35.793, 37.728]

2 [AS0] scsc-usr-13500-02-eth1-07.xyz.com (10.96.15.3) 37.927 ms 36.122 ms *
Expected Output: hop: 2 asn: 0 hostname: scsc-usr-13500-02-eth1-07.xyz.com ip: 10.96.15.3 time: [37.927, 36.122]

I am not sure if something is wrong with the code or the regex itself. Thanks for help!

like image 678
Ankur Avatar asked Oct 18 '22 05:10

Ankur


1 Answers

Answer

Part 1

In order to capture everything you're looking for, you need to use two separate regular expressions. The reason for this is regex will only capture the last group it finds that matches the criteria and you have multiple times in your traceroute results (e.g. 4.452 ms, 3.459 ms, and 3.474 ms in your first line).

For the sake of understanding which groups are being captured, you can use the following regex (it is PCRE and won't work in Java, but it gives you a clear indication of which group is being captured).

This code can be seen in use here

^(?P<hop>\d+)?[\h*]*(?:\[AS(?<ASN>\d*)\])?\h+(?<hostname>[\w\.]+)\h+\((?<ip>[\d+\.]+)\)\h+(?<times>.*?)\h*$

With a slight modification, the above regex can be used in Java (horizontal whitespace \h and named capture groups (?<name>...) are not supported in Java regex as far as I'm aware).

This code can be seen in use here

^(\d+)?[\ \t*]*(?:\[AS(\d*)\])?[\ \t]+([\w\.]+)[\ \t]+\(([\d+\.]+)\)[\ \t]+(.*?)[\ \t]*$

Note: Both global g and multi-line m modifiers are used.


Part 2

Run this second regular expression on the times you capture in Part 1 to gather a list of all the times.

This code can be seen in user here

([\d.]+)





Results

Part 1

Input

1  10.33.128.1 (10.33.128.1)  4.452 ms  3.459 ms  3.474 ms  
6  * [AS3356] 4.68.72.218 (4.68.72.218)  12.432 ms  11.819 ms  
 * 4.68.72.218 (4.68.72.218)  12.432 ms  11.819 ms  
  61.182.180.62 (61.182.180.62) 175.300 ms  203.001 ms

Output

Match 1

  • Full match 0-60 1 10.33.128.1 (10.33.128.1) 4.452 ms 3.459 ms 3.474 ms
  • Group 1. 1
  • Group 3. 10.33.128.1
  • Group 4. 10.33.128.1
  • Group 5. 4.452 ms 3.459 ms 3.474 ms

Match 2

  • Full match 61-124 6 * [AS3356] 4.68.72.218 (4.68.72.218) 12.432 ms 11.819 ms
  • Group 1. 6
  • Group 2.3356
  • Group 3. 4.68.72.218
  • Group 4. 4.68.72.218
  • Group 5. 12.432 ms 11.819 ms

Match 3

  • Full match 125-177 * 4.68.72.218 (4.68.72.218) 12.432 ms 11.819 ms
  • Group 3. 4.68.72.218
  • Group 4. 4.68.72.218
  • Group 5. 12.432 ms 11.819 ms

Match 4

  • Full match 178-232 61.182.180.62 (61.182.180.62) 175.300 ms 203.001 ms
  • Group 3. 61.182.180.62
  • Group 4. 61.182.180.62
  • Group 5. 175.300 ms 203.001 ms

Part 2

Input

4.452 ms  3.459 ms  3.474 ms 

Output

Match 1

  • Full match 0-5 4.452
  • Group 1. 4.452

Match 2

  • Full match 10-15 3.459
  • Group 1. 3.459

Match 3

  • Full match 20-25 3.474
  • Group 1. 3.474





Edits

Thank you to Casimir et Hippolyte for pointing out that Java does indeed allow named capture groups as other regex flavors do.

Here's an updated regex since Java does support named capture groups (?<name>...)

This regex can be seen in use here

^(?P<hop>\d+)?[\t *]*(?:\[AS(?<ASN>\d*)\])?[\t ]+(?<hostname>[\w\.]+)[\t ]+\((?<ip>[\d+\.]+)\)[\t ]+(?<times>.*?)[\t ]*$
like image 145
ctwheels Avatar answered Nov 03 '22 18:11

ctwheels