Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I output only captured groups with sed?

Tags:

regex

sed

Is there a way to tell sed to output only captured groups?

For example, given the input:

This is a sample 123 text and some 987 numbers 

And pattern:

/([\d]+)/ 

Could I get only 123 and 987 output in the way formatted by back references?

like image 830
Pablo Avatar asked May 06 '10 00:05

Pablo


People also ask

How do you use groups in SED?

Grouping can be used in sed like normal regular expression. A group is opened with “\(” and closed with “\)”. Grouping can be used in combination with back-referencing. Back-reference is the re-use of a part of a Regular Expression selected by grouping.

How do capture groups work regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .

Can I use named capturing groups?

Mixing named and numbered capturing groups is not recommended because flavors are inconsistent in how the groups are numbered. If a group doesn't need to have a name, make it non-capturing using the (?:group) syntax. In . NET you can make all unnamed groups non-capturing by setting RegexOptions.

What are capturing groups in regex python?

Capturing groups are a handy feature of regular expression matching that allows us to query the Match object to find out the part of the string that matched against a particular part of the regular expression. Anything you have in parentheses () will be a capture group.


1 Answers

The key to getting this to work is to tell sed to exclude what you don't want to be output as well as specifying what you do want.

string='This is a sample 123 text and some 987 numbers' echo "$string" | sed -rn 's/[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+([[:digit:]]+)[^[:digit:]]*/\1 \2/p' 

This says:

  • don't default to printing each line (-n)
  • exclude zero or more non-digits
  • include one or more digits
  • exclude one or more non-digits
  • include one or more digits
  • exclude zero or more non-digits
  • print the substitution (p)

In general, in sed you capture groups using parentheses and output what you capture using a back reference:

echo "foobarbaz" | sed 's/^foo\(.*\)baz$/\1/' 

will output "bar". If you use -r (-E for OS X) for extended regex, you don't need to escape the parentheses:

echo "foobarbaz" | sed -r 's/^foo(.*)baz$/\1/' 

There can be up to 9 capture groups and their back references. The back references are numbered in the order the groups appear, but they can be used in any order and can be repeated:

echo "foobarbaz" | sed -r 's/^foo(.*)b(.)z$/\2 \1 \2/' 

outputs "a bar a".

If you have GNU grep (it may also work in BSD, including OS X):

echo "$string" | grep -Po '\d+' 

or variations such as:

echo "$string" | grep -Po '(?<=\D )(\d+)' 

The -P option enables Perl Compatible Regular Expressions. See man 3 pcrepattern or man 3 pcresyntax.

like image 189
Dennis Williamson Avatar answered Oct 04 '22 01:10

Dennis Williamson