I am trying to match IP addresses found in the output of traceroute
by means of a regex. I'm not trying to validate them because it's safe enough to assume traceroute
is valid (i.e. is not outputting something like 999.999.999.999
. I'm trying the following regex:
([0-9]{1,3}.?){4}
I'm testing it in regex101 and it does validate an IP address. However, when I try
echo '192.168.1.1 foobar' | grep '([0-9]{1,3}.?){4}'
I get nothing. What am I missing?
In Linux you can use regular expressions with grep to extract an IP address from a file. The grep command has the -E (extended regex) option to allow it to interpret a pattern as a extended regular expression.
You used a POSIX ERE pattern, but did not pass -E
option to have grep
use the POSIX ERE flavor. Thus, grep
used POSIX BRE instead, where you need to escape {n,m}
quantifier and (...)
to make them be parsed as special regex operators.
Note you need to escape a .
so that it could only match a literal dot.
To make your pattern work with grep
the way you wanted you could use:
grep -E '([0-9]{1,3}\.?){4}' # POSIX ERE
grep '\([0-9]\{1,3\}\.\?\)\{4\}' # POSIX BRE version of the same regex
See an online demo.
However, this regex will also match a string of several digits because the .
is optional.
You may solve it by unrolling the pattern as
grep -E '[0-9]{1,3}(\.[0-9]{1,3}){3}' # POSIX ERE
grep '[0-9]\{1,3\}\(\.[0-9]\{1,3\}\)\{3\}' # POSIX BRE
See another demo.
Basically, it matches:
[0-9]{1,3}
- 1 to 3 occurrences of any ASCII digit(\.[0-9]{1,3}){3}
- 3 occurrences of:
\.
- a literal .
[0-9]{1,3}
- 1 to 3 occurrences of any ASCII digitTo make sure you only match valid IPs, you might want to use a more precise IP matching regex:
grep -E '\b(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}\b' # POSIX ERE
See this online demo.
You may further tweak it with word boundaries (can be \<
/ \>
or \b
), etc.
To extract the IPs use -o
option with grep
: grep -oE 'ERE_pattern' file
/ grep -o 'BRE_pattern' file
.
To make a more effective validation, it is better to use a function instead of a simple regex match:
#!/bin/bash
is_valid_ip() {
local arr element
IFS=. read -r -a arr <<< "$1" # convert ip string to array
[[ ${#arr[@]} != 4 ]] && return 1 # doesn't have four parts
for element in "${arr[@]}"; do
[[ $element =~ ^[0-9]+$ ]] || return 1 # non numeric characters found
[[ $element =~ ^0[1-9]+$ ]] || return 1 # 0 not allowed in leading position if followed by other digits, to prevent it from being interpreted as on octal number
((element < 0 || element > 255)) && return 1 # number out of range
done
return 0
}
You can invoke this as:
while read -r ip; do
is_valid_ip "$ip" && printf '%s\n' "$ip"
done < <(your command that extracts ip address like strings)
Related:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With