I'm trying to extract a certain (the fourth) field from the column-based, 'space'-adjusted text stream. I'm trying to use the cut
command in the following manner:
cat text.txt | cut -d " " -f 4
Unfortunately, cut
doesn't treat several spaces as one delimiter. I could have piped through awk
awk '{ printf $4; }'
or sed
sed -E "s/[[:space:]]+/ /g"
to collapse the spaces, but I'd like to know if there any way to deal with cut
and several delimiters natively?
1) The cut command is used to display selected parts of file content in UNIX. 2) The default delimiter in cut command is "tab", you can change the delimiter with the option "-d" in the cut command. 3) The cut command in Linux allows you to select the part of the content by bytes, by character, and by field or column.
Cut Based on a Delimiter You can use any character as a delimiter. Using the cut command to extract fields from a file without specifying the -d option means that the default delimiter is the tab character.
For delimiter separated fields, the -d option is used. The default delimiter is the tab character. This command will extract the second and sixth field from each line, using the ',' character as the delimiter.
Try:
tr -s ' ' <text.txt | cut -d ' ' -f4
From the tr
man page:
-s, --squeeze-repeats replace each input sequence of a repeated character that is listed in SET1 with a single occurrence of that character
As you comment in your question, awk
is really the way to go. To use cut
is possible together with tr -s
to squeeze spaces, as kev's answer shows.
Let me however go through all the possible combinations for future readers. Explanations are at the Test section.
tr -s ' ' < file | cut -d' ' -f4
awk '{print $4}' file
while read -r _ _ _ myfield _ do echo "forth field: $myfield" done < file
sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' file
Given this file, let's test the commands:
$ cat a this is line 1 more text this is line 2 more text this is line 3 more text this is line 4 more text
$ cut -d' ' -f4 a is # it does not show what we want! $ tr -s ' ' < a | cut -d' ' -f4 1 2 # this makes it! 3 4 $
$ awk '{print $4}' a 1 2 3 4
This reads the fields sequentially. By using _
we indicate that this is a throwaway variable as a "junk variable" to ignore these fields. This way, we store $myfield
as the 4th field in the file, no matter the spaces in between them.
$ while read -r _ _ _ a _; do echo "4th field: $a"; done < a 4th field: 1 4th field: 2 4th field: 3 4th field: 4
This catches three groups of spaces and no spaces with ([^ ]*[ ]*){3}
. Then, it catches whatever coming until a space as the 4th field, that it is finally printed with \1
.
$ sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' a 1 2 3 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With