Is there a way to tell <code>sed</code> to output only captured groups? For example, given the input: <pre class="prettyprint lang-none prettyprint-override"><code>This is a sample 123 text and some 987 numbers </code></pre> And pattern: <pre class="prettyprint"><code>/([\d]+)/ </code></pre> Could I get only 123 and 987 output in the way formatted by back references?

The key to getting this to work is to tell <code>sed</code> to exclude what you don't want to be output as well as specifying what you do want. <pre class="prettyprint"><code>string='This is a sample 123 text and some 987 numbers' echo "$string" | sed -rn 's/[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+([[:digit:]]+)[^[:digit:]]*/\1 \2/p' </code></pre> This says: <ul> <li>don't default to printing each line (<code>-n</code>)</li> <li>exclude zero or more non-digits</li> <li>include one or more digits</li> <li>exclude one or more non-digits</li> <li>include one or more digits</li> <li>exclude zero or more non-digits</li> <li>print the substitution (<code>p</code>)</li> </ul> In general, in <code>sed</code> you capture groups using parentheses and output what you capture using a back reference: <pre class="prettyprint"><code>echo "foobarbaz" | sed 's/^foo$.*$baz$/\1/' </code></pre> will output "bar". If you use <code>-r</code> (<code>-E</code> for OS X) for extended regex, you don't need to escape the parentheses: <pre class="prettyprint"><code>echo "foobarbaz" | sed -r 's/^foo(.*)baz$/\1/' </code></pre> There can be up to 9 capture groups and their back references. The back references are numbered in the order the groups appear, but they can be used in any order and can be repeated: <pre class="prettyprint"><code>echo "foobarbaz" | sed -r 's/^foo(.*)b(.)z$/\2 \1 \2/' </code></pre> outputs "a bar a". If you have GNU <code>grep</code> (it may also work in BSD, including OS X): <pre class="prettyprint"><code>echo "$string" | grep -Po '\d+' </code></pre> or variations such as: <pre class="prettyprint"><code>echo "$string" | grep -Po '(?<=\D )(\d+)' </code></pre> The <code>-P</code> option enables Perl Compatible Regular Expressions. See <code>man 3 pcrepattern</code> or <a href="http://linux.die.net/man/3/pcresyntax" rel="noreferrer"><code>man 3 pcresyntax</code></a>.

How can I output only captured groups with sed?

Tags:

regex

sed

Is there a way to tell sed to output only captured groups?

For example, given the input:

This is a sample 123 text and some 987 numbers

And pattern:

/([\d]+)/

Could I get only 123 and 987 output in the way formatted by back references?

830

asked May 06 '10 00:05

Pablo

1 Answers

The key to getting this to work is to tell sed to exclude what you don't want to be output as well as specifying what you do want.

string='This is a sample 123 text and some 987 numbers' echo "$string" | sed -rn 's/[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+([[:digit:]]+)[^[:digit:]]*/\1 \2/p'

This says:

don't default to printing each line (-n)
exclude zero or more non-digits
include one or more digits
exclude one or more non-digits
include one or more digits
exclude zero or more non-digits
print the substitution (p)

In general, in sed you capture groups using parentheses and output what you capture using a back reference:

echo "foobarbaz" | sed 's/^foo\(.*\)baz$/\1/'

will output "bar". If you use -r (-E for OS X) for extended regex, you don't need to escape the parentheses:

echo "foobarbaz" | sed -r 's/^foo(.*)baz$/\1/'

There can be up to 9 capture groups and their back references. The back references are numbered in the order the groups appear, but they can be used in any order and can be repeated:

echo "foobarbaz" | sed -r 's/^foo(.*)b(.)z$/\2 \1 \2/'

outputs "a bar a".

If you have GNU grep (it may also work in BSD, including OS X):

echo "$string" | grep -Po '\d+'

or variations such as:

echo "$string" | grep -Po '(?<=\D )(\d+)'

The -P option enables Perl Compatible Regular Expressions. See man 3 pcrepattern or man 3 pcresyntax.

189

answered Oct 04 '22 01:10

Dennis Williamson

Related questions
                            
                                How can I add a string to the end of each line in Vim?
                            
                                Searching for UUIDs in text with regex
                            
                                Simple regular expression for a decimal with a precision of 2
                            
                                Extract hostname name from string
                            
                                RegEx: Grabbing values between quotation marks
                            
                                How do I get the YouTube video ID from a URL?
                            
                                Replace specific characters within strings
                            
                                Split a string by spaces -- preserving quoted substrings -- in Python
                            
                                Java how to replace 2 or more spaces with single space in string and delete leading and trailing spaces
                            
                                How to use a variable inside a regular expression?
                            
                                I want to remove double quotes from a String
                            
                                How to use JavaScript regex over multiple lines?
                            
                                Regex lookahead, lookbehind and atomic groups
                            
                                Why does a RegExp with global flag give wrong results?
                            
                                VSCode regex find & replace submatch math?
                            
                                Remove new lines from string and replace with one empty space
                            
                                regex.test V.S. string.match to know if a string matches a regular expression
                            
                                Remove all special characters, punctuation and spaces from string
                            
                                Find CRLF in Notepad++
                            
                                Test if characters are in a string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With