Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get column index of field in unix shell

Tags:

shell

unix

csv

sed

I have a csv file with headers:

a,b,c,d,e,f,g,h

I would like to do something

cat abc.csv | sed "something to split them" | grep "e"  

#position of "e"

Can someone guide me how do I get the column idx of which header 'e' is at?

like image 974
aceminer Avatar asked Sep 30 '16 00:09

aceminer


1 Answers

Assuming your goal is to say "which column is this value in", you have a number of options, but this works:

sed -n $'1s/,/\\\n/gp' abc.csv | grep -nx 'e'
#output: 5:e

If you want to get just the number out of that:

sed -n $'1s/,/\\\n/gp' abc.csv | grep -nx 'e' | cut -d: -f1
#output: 5

Explanation:

Since the headers are on the first line of the file, we use the -n option to tell sed not to print out all the lines by default. We then give it an expression that starts with 1, meaning it is only executed on the first line, and ends with p, meaning that line gets printed out afterward.

The expression uses ANSI quotes ($'...') simply so it's easier to read: you can put a newline in it with \n instead of having to include a literal newline. Regardless, by the time the shell is done with it, the expression $'1s/,/\\\n/gp' gets passed to sed as 1s/,/\ /gp, which tells it to replace every comma on the first line with a newline and then print out the result. The output of just the sed on your example would be this:

a
b
c
d
e
f
g
h

(If your CSV file has many lines, you may want to add ;q to the end of the sed command so that it quits after that first line instead of continuing to read and do nothing with the rest of the lines.)

We then pipe that output through a grep command looking for e. We pass the -x option so that it only matches lines consisting of exactly 'e', not just any line containing an 'e' (Thanks @Marcel and @Sundeep), plus the -n option that tells it to include the line number of matching lines in its output. In the example, it outputs 5:e, where the 5: says that the rest of the output is from the 5th line of the input.

We can then pipe that through cut with a field delimiter (-d) of : to extract just the first field (-f1), which is the line number in the sed output - which is the field number in the original file.

like image 57
Mark Reed Avatar answered Oct 17 '22 21:10

Mark Reed