I would like to remove everything after the 2nd occurrence of a particular
pattern in a string. What is the best way to do it in Unix? What is most elegant and simple method to achieve this; sed
, awk
or just unix commands like cut
?
My input would be
After-u-math-how-however
Output should be
After-u
Everything after the 2nd -
should be stripped out. The regex should also match
zero occurrences of the pattern, so zero or one occurrence should be ignored and
from the 2nd occurrence everything should be removed.
So if the input is as follows
After
Output should be
After
Something like this would do it.
echo "After-u-math-how-however" | cut -f1,2 -d'-'
This will split up (cut) the string into fields, using a dash (-
) as the delimiter. Once the string has been split into fields, cut
will print the 1st and 2nd fields.
This might work for you (GNU sed):
sed 's/-[^-]*//2g' file
You could use the following regex to select what you want:
^[^-]*-\?[^-]*
For example:
echo "After-u-math-how-however" | grep -o "^[^-]*-\?[^-]*"
Results:
After-u
@EvanPurkisher's cut -f1,2 -d'-'
solution is IMHO the best one but since you asked about sed and awk:
With GNU sed for -r
$ echo "After-u-math-how-however" | sed -r 's/([^-]+-[^-]*).*/\1/'
After-u
With GNU awk for gensub()
:
$ echo "After-u-math-how-however" | awk '{$0=gensub(/([^-]+-[^-]*).*/,"\\1","")}1'
After-u
Can be done with non-GNU sed using \(
and *
, and with non-GNU awk using match()
and substr()
if necessary.
awk -F - '{print $1 (NF>1? FS $2 : "")}' <<<'After-u-math-how-however'
-
(option spec. -F -
) - accessible as special variable FS
inside the awk
program.print $1
), followed by:
NF>1
), append FS
(i.e., -
) and the 2nd field ($2
)""
, i.e.: effectively only print the 1st field (which in itself may be empty, if the input is empty).If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With