I am trying to parse traceroute results in Java8 using Regex.
I am using the below regex to identify the groups.
^(\\d*).*[AS(\\d*)]?\\s+([\\w+\\.]+)\\s+\\(([\\d+\\.]+)\\)[\\s+(\\d+\\.\\d+)\\s+ms]+
Some example lines that I need to parse are:
1 10.33.128.1 (10.33.128.1) 4.452 ms 3.459 ms 3.474 ms
6 * [AS3356] 4.68.72.218 (4.68.72.218) 12.432 ms 11.819 ms
* 4.68.72.218 (4.68.72.218) 12.432 ms 11.819 ms
61.182.180.62 (61.182.180.62) 175.300 ms 203.001 ms
And I want to extract the hop number (if available), ASN (if available), hostname, IP and time
but with the above regex, it matches the strings 1,2, and 4 which is what I want but only gives me hop, host and ASN.
My code is like this:
Pattern hop_pattern = Pattern.compile(
"^(\\d*).*[AS(\\d*)]?\\s+([\\w+\\.]+)\\s+\\(([\\d+\\.]+)\\)[\\s+(\\d+\\.\\d+)\\s+ms]+")
Matcher m = hop_pattern.matcher(target);
while(m.find()) {
System.out.println("count: " + m.groupCount());
for(int i = 1; i < m.groupCount() + 1; i++) {
System.out.println(i + "->" + m.group(i));
}
}
Some example lines that I need to parse are:
1 10.33.128.1 (10.33.128.1) 4.452 ms 3.459 ms 3.474 ms
6 * [AS3356] 4.68.72.218 (4.68.72.218) 12.432 ms 11.819 ms
* 4.68.72.218 (4.68.72.218) 12.432 ms 11.819 ms
61.182.180.62 (61.182.180.62) 175.300 ms 203.001 ms
And I want to extract the hop number (if available), ASN (if available), hostname, IP and time
but with the above regex, it matches the strings 1,2, and 4 which is what I want but only gives me hop, host and ASN.
My code is like this:
Pattern hop_pattern = Pattern.compile(
"^(\\d*).*[AS(\\d*)]?\\s+([\\w+\\.]+)\\s+\\(([\\d+\\.]+)\\)[\\s+(\\d+\\.\\d+)\\s+ms]+")
Matcher m = hop_pattern.matcher(target);
while(m.find()) {
System.out.println("count: " + m.groupCount());
for(int i = 1; i < m.groupCount() + 1; i++) {
System.out.println(i + "->" + m.group(i));
}
}
I am not sure if something is wrong with the code or the regex itself. Thanks for help!
Update: Some examples and sample output
1 [AS0] 10.200.200.200 (10.200.200.200) 37.526 ms 35.793 ms 37.728 ms
Expected Output: hop: 1 asn: 0 hostname: 10.200.200.200 ip: 10.200.200.200 time: [37.526, 35.793, 37.728]2 [AS0] scsc-usr-13500-02-eth1-07.xyz.com (10.96.15.3) 37.927 ms 36.122 ms *
Expected Output: hop: 2 asn: 0 hostname: scsc-usr-13500-02-eth1-07.xyz.com ip: 10.96.15.3 time: [37.927, 36.122]
I am not sure if something is wrong with the code or the regex itself. Thanks for help!
In order to capture everything you're looking for, you need to use two separate regular expressions. The reason for this is regex will only capture the last group it finds that matches the criteria and you have multiple times in your traceroute results (e.g. 4.452 ms
, 3.459 ms
, and 3.474 ms
in your first line).
For the sake of understanding which groups are being captured, you can use the following regex (it is PCRE and won't work in Java, but it gives you a clear indication of which group is being captured).
This code can be seen in use here
^(?P<hop>\d+)?[\h*]*(?:\[AS(?<ASN>\d*)\])?\h+(?<hostname>[\w\.]+)\h+\((?<ip>[\d+\.]+)\)\h+(?<times>.*?)\h*$
With a slight modification, the above regex can be used in Java (horizontal whitespace \h
and named capture groups (?<name>...)
are not supported in Java regex as far as I'm aware).
This code can be seen in use here
^(\d+)?[\ \t*]*(?:\[AS(\d*)\])?[\ \t]+([\w\.]+)[\ \t]+\(([\d+\.]+)\)[\ \t]+(.*?)[\ \t]*$
Note: Both global g
and multi-line m
modifiers are used.
Run this second regular expression on the times you capture in Part 1 to gather a list of all the times.
This code can be seen in user here
([\d.]+)
1 10.33.128.1 (10.33.128.1) 4.452 ms 3.459 ms 3.474 ms
6 * [AS3356] 4.68.72.218 (4.68.72.218) 12.432 ms 11.819 ms
* 4.68.72.218 (4.68.72.218) 12.432 ms 11.819 ms
61.182.180.62 (61.182.180.62) 175.300 ms 203.001 ms
Match 1
1 10.33.128.1 (10.33.128.1) 4.452 ms 3.459 ms 3.474 ms
1
10.33.128.1
10.33.128.1
4.452 ms 3.459 ms 3.474 ms
Match 2
6 * [AS3356] 4.68.72.218 (4.68.72.218) 12.432 ms 11.819 ms
6
3356
4.68.72.218
4.68.72.218
12.432 ms 11.819 ms
Match 3
* 4.68.72.218 (4.68.72.218) 12.432 ms 11.819 ms
4.68.72.218
4.68.72.218
12.432 ms 11.819 ms
Match 4
61.182.180.62 (61.182.180.62) 175.300 ms 203.001 ms
61.182.180.62
61.182.180.62
175.300 ms 203.001 ms
4.452 ms 3.459 ms 3.474 ms
Match 1
4.452
4.452
Match 2
3.459
3.459
Match 3
3.474
3.474
Thank you to Casimir et Hippolyte for pointing out that Java does indeed allow named capture groups as other regex flavors do.
Here's an updated regex since Java does support named capture groups (?<name>...)
This regex can be seen in use here
^(?P<hop>\d+)?[\t *]*(?:\[AS(?<ASN>\d*)\])?[\t ]+(?<hostname>[\w\.]+)[\t ]+\((?<ip>[\d+\.]+)\)[\t ]+(?<times>.*?)[\t ]*$
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With