Sorry if this is too basic. I have a csv file where the columns have a header row (v1, v2, etc.). I understand that to extract columns 1 and 2, I have to do: awk -F "," '{print $1 "," $2}' infile.csv > outfile.csv
. But what if I have to extract, say, columns 1 to 10, 20 to 25, and 30, 33? As an addendum, is there any way to extract directly with the header names rather than with column numbers?
You need to specify the latter because some files may use spaces, tabs, or colons to separate columns. cut is a command utility and here is some more examples: SYNOPSIS cut -b list [-n] [file ...] cut -c list [file ...] cut -f list [-d delim] [-s] [file ...] Show activity on this post.
The AWK Field Separator (FS) is used to specify and control how AWK splits a record into various fields. Also, it can accept a single character of a regular expression. Once you specify a regular expression as the value for the FS, AWK scans the input values for the sequence of characters set in the regular expression.
I don't know if it's possible to do ranges in awk. You could do a for loop, but you would have to add handling to filter out the columns you don't want. It's probably easier to do this:
awk -F, '{OFS=",";print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$20,$21,$22,$23,$24,$25,$30,$33}' infile.csv > outfile.csv
something else to consider - and this faster and more concise:
cut -d "," -f1-10,20-25,30-33 infile.csv > outfile.csv
As to the second part of your question, I would probably write a script in perl that knows how to handle header rows, parsing the columns names from stdin or a file and then doing the filtering. It's probably a tool I would want to have for other things. I am not sure about doing in a one liner, although I am sure it can be done.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With