Removing lines with repetitive values in last

Question

I have a tab delimited file which looks like this

chr1  12226559  12227059  TNFRSF1B       
chr1  17051560  17052060                 
chr1  17053279  17053779                 
chr1  17338423  17338923  ATP13A2        
                          ATP13A2        
                          ATP13A2        
chr1  19577574  19578074  EMC1           
                          MRTO4          
chr1  19578046  19578546  EMC1           
                          MRTO4          
chr1  19638239  19638739  AKR7A2         
                          PQLC2          
                          PQLC2          
                          PQLC2
                          AKR7A2         
                          PQLC2

I want that the lines where value of column4 is repeated should be removed.

First three columns are co ordinates and in those co-ordinates whatever we find is listed (in col4), and for each co-ordinate I want to have only unique names and not the repeatation of names.

I want an output like this

chr1  12226559  12227059  TNFRSF1B       
chr1  17051560  17052060                 
chr1  17053279  17053779                 
chr1  17338423  17338923  ATP13A2              
chr1  19577574  19578074  EMC1           
                          MRTO4          
chr1  19578046  19578546  EMC1           
                          MRTO4          
chr1  19638239  19638739  AKR7A2         
                          PQLC2

Things that I have tried

sort -k 4 -u file

awk '{if($4==temp1){next;}else{print}temp1=$4}' file

Nothing works :(

Please help

Thank you

glenn jackman · Accepted Answer

You just need

awk '$NF != prev {print} {prev=$NF}'

EDIT: to handle the new input

awk '{
    if (NF == 1) 
        value = $1
    else {
        key =  $1 SUBSEP $2 SUBSEP $3
        value = $4
    }
    if ((key SUBSEP value) in val) 
        next
    print
    val[key, value] = 1
}' input

Removing lines with repetitive values in last

Tags:

python

bash

awk

perl

bioinformatics

Angelo

1 Answers

glenn jackman

Recent Activity

Donate For Us

Removing lines with repetitive values in last

Tags:

python

bash

awk

perl

bioinformatics

Angelo

1 Answers

glenn jackman

Related questions

Recent Activity

Donate For Us