I have a data frame with >100 columns each labeled with a unique string. Column 1 represents the index variable. I would like to use a basic UNIX command to extract the index column (column 1) + a specific column string using <code>grep</code>. For example, if my data frame looks like the following: <pre class="prettyprint"><code>Index A B C...D E F p1 1 7 4 2 5 6 p2 2 2 1 2 . 3 p3 3 3 1 5 6 1 </code></pre> I would like to use some command to extract only column "X" which I will specify with <code>grep</code>, and display both column 1 & the column I <code>grep</code>'d. I know that I can use <code>cut -f1 myfile</code> for the first bit, but need help with the <code>grep</code> per column. As a more concrete example, if my <code>grep</code> phrase were "B", I would like the output to be: <pre class="prettyprint"><code>Index B p1 7 p2 2 p3 3 </code></pre> I am new to UNIX, and have not found much in similar examples. Any help would be much appreciated!!

You need to use awk: <pre class="prettyprint"><code>awk '{print $1,$3}' <namefile> </code></pre> This simple command allows printing the first ($1) and third ($3) column of the file. The software awk is actually much more powerful. I think you should have a look at the man page of awk. A nice combo is using grep and awk with a pipe. The following code will print column 1 and 3 of only the lines of your file that contain 'p1': <pre class="prettyprint"><code>grep 'p1' <namefile> | awk '{print $1,$3}' </code></pre> If, instead, you want to select lines by line number you can replace grep with sed: <pre class="prettyprint"><code>sed 1p <namefile> | awk '{print $1,$3}' </code></pre> Actually, awk can be used alone in all the examples: <pre class="prettyprint"><code>awk '/p1/{print $1,$3}' <namefile> # will print only lines containing p1 awk '{if(NR == 1){print $1,$3}}' <namefile> # Will print only first line </code></pre>

First figure out the command to find the column number. <pre class="prettyprint"><code>columnname=C sed -n "1 s/${columnname}.*//p" datafile | sed 's/[^\t*]//g' | wc -c </code></pre> Once you know the number, use cut <pre class="prettyprint"><code>cut -f1,3 < datafile </code></pre> Combine into one command <pre class="prettyprint"><code>cut -f1,$(sed -n "1 s/${columnname}.*//p" datafile | sed 's/[^\t*]//g' | wc -c) < datafile </code></pre> Finished? No, you should improve the first <code>sed</code> command when one header can be a substring of another header: include tabs in your match and put the tabs back in the replacement string.

Extract column using grep

Tags:

I have a data frame with >100 columns each labeled with a unique string. Column 1 represents the index variable. I would like to use a basic UNIX command to extract the index column (column 1) + a specific column string using grep.

For example, if my data frame looks like the following:

Click to copy

Index  A  B  C...D  E  F p1     1  7  4   2  5  6 p2     2  2  1   2  .  3 p3     3  3  1   5  6  1

I would like to use some command to extract only column "X" which I will specify with grep, and display both column 1 & the column I grep'd. I know that I can use cut -f1 myfile for the first bit, but need help with the grep per column. As a more concrete example, if my grep phrase were "B", I would like the output to be:

Click to copy

Index  B p1     7 p2     2 p3     3

I am new to UNIX, and have not found much in similar examples. Any help would be much appreciated!!

450

asked Sep 17 '16 20:09

AMS

2 Answers

You need to use awk:

Click to copy

awk '{print $1,$3}' <namefile>

This simple command allows printing the first ($1) and third ($3) column of the file. The software awk is actually much more powerful. I think you should have a look at the man page of awk.

A nice combo is using grep and awk with a pipe. The following code will print column 1 and 3 of only the lines of your file that contain 'p1':

Click to copy

grep 'p1' <namefile> | awk '{print $1,$3}'

If, instead, you want to select lines by line number you can replace grep with sed:

Click to copy

sed 1p <namefile> | awk '{print $1,$3}'

Actually, awk can be used alone in all the examples:

Click to copy

awk '/p1/{print $1,$3}' <namefile> # will print only lines containing p1 awk '{if(NR == 1){print $1,$3}}' <namefile> # Will print only first line

answered Oct 13 '22 02:10

Riccardo Petraglia

First figure out the command to find the column number.

Click to copy

columnname=C sed -n "1 s/${columnname}.*//p" datafile | sed 's/[^\t*]//g' | wc -c

Once you know the number, use cut

Click to copy

cut -f1,3 < datafile

Combine into one command

Click to copy

cut -f1,$(sed -n "1 s/${columnname}.*//p" datafile |     sed 's/[^\t*]//g' | wc -c) < datafile

Finished? No, you should improve the first sed command when one header can be a substring of another header: include tabs in your match and put the tabs back in the replacement string.

answered Oct 13 '22 01:10

Walter A

Related questions
                            
                                Equivalent C# statement for this VB6 operation creating problems
                            
                                Invalid zip file after creating it with System.IO.Compression
                            
                                No module named 'numpy': Visual Studio Code
                            
                                Npgsql Exception while reading from stream, Postgres
                            
                                Task "default" is not in your gulpfile
                            
                                The difference between comparison to np.nan and isnull()
                            
                                Filter on multiple columns using one pipe angular 2
                            
                                Angular 2: Prevent router from adding to history
                            
                                Dynamically adding components in ngFor
                            
                                Telegram bot weird error : Bad Request: wrong file identifier/HTTP URL specified
                            
                                How do you change the Markdown preview font size in IntelliJ IDEA?
                            
                                Angular 2 animate element generated by ngFor

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Extract column using grep

Tags:

AMS

People also ask

2 Answers

Riccardo Petraglia

Walter A

Recent Activity

Donate For Us