<p>I'm trying to extract a certain (the fourth) field from the column-based, 'space'-adjusted text stream. I'm trying to use the <code>cut</code> command in the following manner: </p> <p><code>cat text.txt | cut -d " " -f 4</code></p> <p>Unfortunately, <code>cut</code> doesn't treat several spaces as one delimiter. I could have piped through awk</p> <p><code>awk '{ printf $4; }'</code> </p> <p>or sed</p> <p><code>sed -E "s/[[:space:]]+/ /g"</code></p> <p>to collapse the spaces, but I'd like to know if there any way to deal with <code>cut</code> and several delimiters natively?</p>

<p>Try:</p> <pre class="prettyprint"><code>tr -s ' ' <text.txt | cut -d ' ' -f4 </code></pre> <p>From the <code>tr</code> man page:</p> <pre class="prettyprint"> -s, --squeeze-repeats replace each input sequence of a repeated character that is listed in SET1 with a single occurrence of that character </pre>

<p>As you comment in your question, <code>awk</code> is really the way to go. To use <code>cut</code> is possible together with <code>tr -s</code> to squeeze spaces, as kev's answer shows.</p> <p>Let me however go through all the possible combinations for future readers. Explanations are at the Test section.</p> <h3>tr | cut</h3> <pre class="prettyprint"><code>tr -s ' ' < file | cut -d' ' -f4 </code></pre> <h3>awk</h3> <pre class="prettyprint"><code>awk '{print $4}' file </code></pre> <h3>bash</h3> <pre class="prettyprint"><code>while read -r _ _ _ myfield _ do echo "forth field: $myfield" done < file </code></pre> <h3>sed</h3> <pre class="prettyprint"><code>sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' file </code></pre> <hr> <h3>Tests</h3> <p>Given this file, let's test the commands:</p> <pre class="prettyprint"><code>$ cat a this is line 1 more text this is line 2 more text this is line 3 more text this is line 4 more text </code></pre> <h3>tr | cut</h3> <pre class="prettyprint"><code>$ cut -d' ' -f4 a is # it does not show what we want! $ tr -s ' ' < a | cut -d' ' -f4 1 2 # this makes it! 3 4 $ </code></pre> <h3>awk</h3> <pre class="prettyprint"><code>$ awk '{print $4}' a 1 2 3 4 </code></pre> <h3>bash</h3> <p>This reads the fields sequentially. By using <code>_</code> we indicate that this is a throwaway variable as a "junk variable" to ignore these fields. This way, we store <code>$myfield</code> as the 4th field in the file, no matter the spaces in between them.</p> <pre class="prettyprint"><code>$ while read -r _ _ _ a _; do echo "4th field: $a"; done < a 4th field: 1 4th field: 2 4th field: 3 4th field: 4 </code></pre> <h3>sed</h3> <p>This catches three groups of spaces and no spaces with <code>([^ ]*[ ]*){3}</code>. Then, it catches whatever coming until a space as the 4th field, that it is finally printed with <code>\1</code>.</p> <pre class="prettyprint"><code>$ sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' a 1 2 3 4 </code></pre>

How to make the 'cut' command treat same sequental delimiters as one?

2 Answers

Try:

tr -s ' ' <text.txt | cut -d ' ' -f4

From the tr man page:

 -s, --squeeze-repeats   replace each input sequence of a repeated character                         that is listed in SET1 with a single occurrence                         of that character

answered Oct 01 '22 13:10

kev

As you comment in your question, awk is really the way to go. To use cut is possible together with tr -s to squeeze spaces, as kev's answer shows.

Let me however go through all the possible combinations for future readers. Explanations are at the Test section.

tr | cut

tr -s ' ' < file | cut -d' ' -f4

awk

awk '{print $4}' file

bash

while read -r _ _ _ myfield _ do    echo "forth field: $myfield" done < file

sed

sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' file

Tests

Given this file, let's test the commands:

$ cat a this   is    line     1 more text this      is line    2     more text this    is line 3     more text this is   line 4            more    text

tr | cut

$ cut -d' ' -f4 a is                         # it does not show what we want!   $ tr -s ' ' < a | cut -d' ' -f4 1 2                       # this makes it! 3 4 $

awk

$ awk '{print $4}' a 1 2 3 4

bash

This reads the fields sequentially. By using _ we indicate that this is a throwaway variable as a "junk variable" to ignore these fields. This way, we store $myfield as the 4th field in the file, no matter the spaces in between them.

$ while read -r _ _ _ a _; do echo "4th field: $a"; done < a 4th field: 1 4th field: 2 4th field: 3 4th field: 4

sed

This catches three groups of spaces and no spaces with ([^ ]*[ ]*){3}. Then, it catches whatever coming until a space as the 4th field, that it is finally printed with \1.

$ sed -r 's/^([^ ]*[ ]*){3}([^ ]*).*/\2/' a 1 2 3 4

answered Oct 01 '22 14:10

fedorqui 'SO stop harming'

Related questions
                            
                                Select random lines from a file
                            
                                How to create a CPU spike with a bash command
                            
                                How to get the start time of a long-running Linux process?
                            
                                How can I use ":" as an AWK field separator?
                            
                                Install MySQL on Ubuntu without a password prompt
                            
                                How to get the first line of a file in a bash script?
                            
                                Fast way of finding lines in one file that are not in another?
                            
                                Open and write data to text file using Bash?
                            
                                Test if a command outputs an empty string
                            
                                Can bash show a function's definition?
                            
                                Docker: How to use bash with an Alpine based docker image?
                            
                                Find the files that have been changed in last 24 hours
                            
                                Repeat command automatically in Linux
                            
                                How do I find all of the symlinks in a directory tree?
                            
                                Running script upon login mac [closed]
                            
                                count number of lines in terminal output
                            
                                Write to file, but overwrite it if it exists
                            
                                How to loop through file names returned by find?
                            
                                Why do you need ./ (dot-slash) before executable or script name to run it in bash?
                            
                                Timeout a command in bash without unnecessary delay

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to make the 'cut' command treat same sequental delimiters as one?

Tags:

bash

unix

delimiter

cut

mbaitoff

People also ask

2 Answers

kev

tr | cut

awk

bash

sed

Tests

tr | cut

awk

bash

sed

fedorqui 'SO stop harming'

Recent Activity

Donate For Us