Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove everything after 2nd occurrence in a string in unix

I would like to remove everything after the 2nd occurrence of a particular pattern in a string. What is the best way to do it in Unix? What is most elegant and simple method to achieve this; sed, awk or just unix commands like cut?

My input would be

After-u-math-how-however

Output should be

After-u

Everything after the 2nd - should be stripped out. The regex should also match zero occurrences of the pattern, so zero or one occurrence should be ignored and from the 2nd occurrence everything should be removed.

So if the input is as follows

After

Output should be

After
like image 263
Jose Avatar asked May 16 '14 00:05

Jose


5 Answers

Something like this would do it.

echo "After-u-math-how-however" | cut -f1,2 -d'-'

This will split up (cut) the string into fields, using a dash (-) as the delimiter. Once the string has been split into fields, cut will print the 1st and 2nd fields.

like image 87
Evan Purkhiser Avatar answered Nov 04 '22 12:11

Evan Purkhiser


This might work for you (GNU sed):

sed 's/-[^-]*//2g' file
like image 30
potong Avatar answered Nov 04 '22 12:11

potong


You could use the following regex to select what you want:

^[^-]*-\?[^-]*

For example:

echo "After-u-math-how-however" | grep -o "^[^-]*-\?[^-]*"

Results:

After-u
like image 2
Steve Avatar answered Nov 04 '22 13:11

Steve


@EvanPurkisher's cut -f1,2 -d'-' solution is IMHO the best one but since you asked about sed and awk:

With GNU sed for -r

$ echo "After-u-math-how-however" | sed -r 's/([^-]+-[^-]*).*/\1/'
After-u

With GNU awk for gensub():

$ echo "After-u-math-how-however" | awk '{$0=gensub(/([^-]+-[^-]*).*/,"\\1","")}1'
After-u

Can be done with non-GNU sed using \( and *, and with non-GNU awk using match() and substr() if necessary.

like image 2
Ed Morton Avatar answered Nov 04 '22 13:11

Ed Morton


awk -F - '{print $1 (NF>1? FS $2 : "")}' <<<'After-u-math-how-however'
  • Split the line into fields based on field separator - (option spec. -F -) - accessible as special variable FS inside the awk program.
  • Always print the 1st field (print $1), followed by:
    • If there's more than 1 field (NF>1), append FS (i.e., -) and the 2nd field ($2)
    • Otherwise: append "", i.e.: effectively only print the 1st field (which in itself may be empty, if the input is empty).
like image 1
mklement0 Avatar answered Nov 04 '22 12:11

mklement0