Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use cut with multiple character delimiter in Unix?

Tags:

sed

delimiter

cut

My file looks like this

abc ||| xyz ||| foo bar hello world ||| spam ham jam ||| blah blah 

I want to extract a specific column, e.g. I could have done:

sed 's/\s|||\s/\\t/g' file | cut -f1 

But is there another way of doing that?

like image 634
alvas Avatar asked Aug 22 '14 12:08

alvas


People also ask

Can delimiter be more than one character?

When using MS Excel's Text to column feature, you can tell Excel what to use as a delimiter in order to split text into columns. The one problem is that for the custom delimiter you are only allowed to enter 1 character.

How do I specify a delimiter in awk?

The AWK Field Separator (FS) is used to specify and control how AWK splits a record into various fields. Also, it can accept a single character of a regular expression. Once you specify a regular expression as the value for the FS, AWK scans the input values for the sequence of characters set in the regular expression.

How do you cut a tab delimited file in Unix?

Unix Cut by a delimiter The tab character is the default delimiter for cut command. and "-f" option is used to cut by a delimiter. You can override delimiter by providing the "-d" option. Following UNIX or Linux cut command example will show you how to split a line by delimiter in UNIX.


1 Answers

Since | is a valid regex expression, it needs to be escaped with \\| or put in square brackets: [|].

You can do this:

awk -F' \\|\\|\\| ' '{print $1}' file 

Some other variations that work as well:

awk -F' [|][|][|] ' '{print "$1"}' file awk -F' [|]{3} ' '{print "$1"}' file awk -F' \\|{3} ' '{print "$1"}' file awk -F' \\|+ ' '{print "$1"}' file awk -F' [|]+ ' '{print "$1"}' file 

\ as separator does not work well in square brackets, only escaping, and many escape chars :)

cat file abc \\\ xyz \\\ foo bar 

Example: 4 \ for every \ in the expression, so 12 \ in total.

awk -F' \\\\\\\\\\\\ ' '{print $2}' file xyz 

or

awk -F' \\\\{3} ' '{print $2}' file xyz 

or this but it's not much simpler

awk -F' [\\\\]{3} ' '{print $2}' file xyz  awk -F' [\\\\][\\\\][\\\\] ' '{print $2}' file xyz 
like image 173
Jotne Avatar answered Oct 02 '22 03:10

Jotne