Organising Data

Question

I have data file which looks like this:

chr1 762440 762981 SAMD11 
chr1 858932 859148 KLHL17 SAMD11 NOC2L 
chr1 859786 860145 KLHL17 SAMD11 NOC2L
chr1 890663 891747 KLHL17 NOC2L  SAMD11  HES4

I want to is to arrange all the names one below the other with the values in first three column.

Something like this

chr1 762440 762981 SAMD11 
chr1 858932 859148 KLHL17
chr1 858932 859148 SAMD11 
chr1 858932 859148 NOC2L 
chr1 859786 860145 KLHL17 
chr1 859786 860145 SAMD11 
chr1 859786 860145 NOC2L

This output is for the first three lines but is desired for the entire set.

The number of names in each line are not fixed, please keep that point in mind (it can be 1 or 5 or 10 or 20 names)

What I thought

use sed -i .bak to place the names one below the other along with the value in first three columns.

But in the end it became overly complicated.

Could you please think of a simpler way to get around this?

Thank you

Jotne · Accepted Answer

Using awk

awk '{for (i=4;i<=NF;i++) print $1,$2,$3,$i}' file
chr1 762440 762981 SAMD11
chr1 858932 859148 KLHL17
chr1 858932 859148 SAMD11
chr1 858932 859148 NOC2L
chr1 859786 860145 KLHL17
chr1 859786 860145 SAMD11
chr1 859786 860145 NOC2L
chr1 890663 891747 KLHL17
chr1 890663 891747 NOC2L
chr1 890663 891747 SAMD11
chr1 890663 891747 HES4

Dave Cross · Answer

Here's how I'd do it in Perl:

#!/usr/bin/perl

use strict;
use warnings;
use 5.010;

while (<DATA>) {
  chomp;
  my @line = split;
  for my $field (@line[3 .. $#line]) {
    say "@line[0 .. 2] $field";
  }
}

__END__
chr1 762440 762981 SAMD11 
chr1 858932 859148 KLHL17 SAMD11 NOC2L 
chr1 859786 860145 KLHL17 SAMD11 NOC2L
chr1 890663 891747 KLHL17 NOC2L  SAMD11  HES4

Organising Data

Tags:

python

bash

sed

perl

Angelo

2 Answers

Jotne

Dave Cross

Recent Activity

Donate For Us

Organising Data

Tags:

python

bash

sed

perl

Angelo

2 Answers

Jotne

Dave Cross

Related questions

Recent Activity

Donate For Us