Is there a way to tell sed
to output only captured groups?
For example, given the input:
This is a sample 123 text and some 987 numbers
And pattern:
/([\d]+)/
Could I get only 123 and 987 output in the way formatted by back references?
Grouping can be used in sed like normal regular expression. A group is opened with “\(” and closed with “\)”. Grouping can be used in combination with back-referencing. Back-reference is the re-use of a part of a Regular Expression selected by grouping.
Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .
Mixing named and numbered capturing groups is not recommended because flavors are inconsistent in how the groups are numbered. If a group doesn't need to have a name, make it non-capturing using the (?:group) syntax. In . NET you can make all unnamed groups non-capturing by setting RegexOptions.
Capturing groups are a handy feature of regular expression matching that allows us to query the Match object to find out the part of the string that matched against a particular part of the regular expression. Anything you have in parentheses () will be a capture group.
The key to getting this to work is to tell sed
to exclude what you don't want to be output as well as specifying what you do want.
string='This is a sample 123 text and some 987 numbers' echo "$string" | sed -rn 's/[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+([[:digit:]]+)[^[:digit:]]*/\1 \2/p'
This says:
-n
)p
)In general, in sed
you capture groups using parentheses and output what you capture using a back reference:
echo "foobarbaz" | sed 's/^foo\(.*\)baz$/\1/'
will output "bar". If you use -r
(-E
for OS X) for extended regex, you don't need to escape the parentheses:
echo "foobarbaz" | sed -r 's/^foo(.*)baz$/\1/'
There can be up to 9 capture groups and their back references. The back references are numbered in the order the groups appear, but they can be used in any order and can be repeated:
echo "foobarbaz" | sed -r 's/^foo(.*)b(.)z$/\2 \1 \2/'
outputs "a bar a".
If you have GNU grep
(it may also work in BSD, including OS X):
echo "$string" | grep -Po '\d+'
or variations such as:
echo "$string" | grep -Po '(?<=\D )(\d+)'
The -P
option enables Perl Compatible Regular Expressions. See man 3 pcrepattern
or man 3 pcresyntax
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With