Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make the 'cut' command treat same sequental delimiters as one?

I'm trying to extract a certain (the fourth) field from the column-based, 'space'-adjusted text stream. I'm trying to use the cut command in the following manner:

cat text.txt | cut -d " " -f 4

Unfortunately, cut doesn't treat several spaces as one delimiter. I could have piped through awk

awk '{ printf $4; }'

or sed

sed -E "s/[[:space:]]+/ /g"

to collapse the spaces, but I'd like to know if there any way to deal with cut and several delimiters natively?

like image 256
mbaitoff Avatar asked Nov 10 '10 10:11

mbaitoff


People also ask

How do you specify delimiter in cut command?

1) The cut command is used to display selected parts of file content in UNIX. 2) The default delimiter in cut command is "tab", you can change the delimiter with the option "-d" in the cut command. 3) The cut command in Linux allows you to select the part of the content by bytes, by character, and by field or column.

Which is the default delimiter in option of cut command?

Cut Based on a Delimiter You can use any character as a delimiter. Using the cut command to extract fields from a file without specifying the -d option means that the default delimiter is the tab character.

What option is used in cut command to display the second character from each line?

For delimiter separated fields, the -d option is used. The default delimiter is the tab character. This command will extract the second and sixth field from each line, using the ',' character as the delimiter.


2 Answers

Try:

tr -s ' ' <text.txt | cut -d ' ' -f4 

From the tr man page:

 -s, --squeeze-repeats   replace each input sequence of a repeated character                         that is listed in SET1 with a single occurrence                         of that character 
like image 51
kev Avatar answered Oct 01 '22 13:10

kev


As you comment in your question, awk is really the way to go. To use cut is possible together with tr -s to squeeze spaces, as kev's answer shows.

Let me however go through all the possible combinations for future readers. Explanations are at the Test section.

tr | cut

tr -s ' ' < file | cut -d' ' -f4 

awk

awk '{print $4}' file 

bash

while read -r _ _ _ myfield _ do    echo "forth field: $myfield" done < file 

sed

sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' file 

Tests

Given this file, let's test the commands:

$ cat a this   is    line     1 more text this      is line    2     more text this    is line 3     more text this is   line 4            more    text 

tr | cut

$ cut -d' ' -f4 a is                         # it does not show what we want!   $ tr -s ' ' < a | cut -d' ' -f4 1 2                       # this makes it! 3 4 $ 

awk

$ awk '{print $4}' a 1 2 3 4 

bash

This reads the fields sequentially. By using _ we indicate that this is a throwaway variable as a "junk variable" to ignore these fields. This way, we store $myfield as the 4th field in the file, no matter the spaces in between them.

$ while read -r _ _ _ a _; do echo "4th field: $a"; done < a 4th field: 1 4th field: 2 4th field: 3 4th field: 4 

sed

This catches three groups of spaces and no spaces with ([^ ]*[ ]*){3}. Then, it catches whatever coming until a space as the 4th field, that it is finally printed with \1.

$ sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' a 1 2 3 4 
like image 21
fedorqui 'SO stop harming' Avatar answered Oct 01 '22 14:10

fedorqui 'SO stop harming'