I have data file which looks like this:
chr1 762440 762981 SAMD11
chr1 858932 859148 KLHL17 SAMD11 NOC2L
chr1 859786 860145 KLHL17 SAMD11 NOC2L
chr1 890663 891747 KLHL17 NOC2L SAMD11 HES4
I want to is to arrange all the names one below the other with the values in first three column.
Something like this
chr1 762440 762981 SAMD11
chr1 858932 859148 KLHL17
chr1 858932 859148 SAMD11
chr1 858932 859148 NOC2L
chr1 859786 860145 KLHL17
chr1 859786 860145 SAMD11
chr1 859786 860145 NOC2L
This output is for the first three lines but is desired for the entire set.
The number of names in each line are not fixed, please keep that point in mind (it can be 1 or 5 or 10 or 20 names)
What I thought
use sed -i .bak to place the names one below the other along with the value in first three columns.
But in the end it became overly complicated.
Could you please think of a simpler way to get around this?
Thank you
Using awk
awk '{for (i=4;i<=NF;i++) print $1,$2,$3,$i}' file
chr1 762440 762981 SAMD11
chr1 858932 859148 KLHL17
chr1 858932 859148 SAMD11
chr1 858932 859148 NOC2L
chr1 859786 860145 KLHL17
chr1 859786 860145 SAMD11
chr1 859786 860145 NOC2L
chr1 890663 891747 KLHL17
chr1 890663 891747 NOC2L
chr1 890663 891747 SAMD11
chr1 890663 891747 HES4
Here's how I'd do it in Perl:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
while (<DATA>) {
chomp;
my @line = split;
for my $field (@line[3 .. $#line]) {
say "@line[0 .. 2] $field";
}
}
__END__
chr1 762440 762981 SAMD11
chr1 858932 859148 KLHL17 SAMD11 NOC2L
chr1 859786 860145 KLHL17 SAMD11 NOC2L
chr1 890663 891747 KLHL17 NOC2L SAMD11 HES4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With