Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting columns from text file with different delimiters in Linux

Tags:

linux

I have very large genotype files that are basically impossible to open in R, so I am trying to extract the rows and columns of interest using linux command line. Rows are straightforward enough using head/tail, but I'm having difficulty figuring out how to handle the columns.

If I attempt to extract (say) the 100-105th tab or space delimited column using

 cut -c100-105 myfile >outfile 

this obviously won't work if there are strings of multiple characters in each column. Is there some way to modify cut with appropriate arguments so that it extracts the entire string within a column, where columns are defined as space or tab (or any other character) delimited?

like image 659
user1815498 Avatar asked Nov 13 '13 16:11

user1815498


People also ask

Which command is used to extract specific columns from the file in Linux?

Explanation: cut command is used for cutting specific columns.

How do I split a column in Linux?

A nifty command called cut lets you select a list of columns or fields from one or more files. You must specify either the -c option to cut by column or -f to cut by fields. (Fields are separated by tabs unless you specify a different field separator with -d.

How do I cut a specific column in a Unix file?

1) The cut command is used to display selected parts of file content in UNIX. 2) The default delimiter in cut command is "tab", you can change the delimiter with the option "-d" in the cut command. 3) The cut command in Linux allows you to select the part of the content by bytes, by character, and by field or column.


1 Answers

If the command should work with both tabs and spaces as the delimiter I would use awk:

awk '{print $100,$101,$102,$103,$104,$105}' myfile > outfile 

As long as you just need to specify 5 fields it is imo ok to just type them, for longer ranges you can use a for loop:

awk '{for(i=100;i<=105;i++)print $i}' myfile > outfile 

If you want to use cut, you need to use the -f option:

cut -f100-105 myfile > outfile 

If the field delimiter is different from TAB you need to specify it using -d:

cut -d' ' -f100-105 myfile > outfile 

Check the man page for more info on the cut command.

like image 185
hek2mgl Avatar answered Sep 21 '22 00:09

hek2mgl