Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to delete duplicated rows based in a column value?

Given the following table

 123456.451 entered-auto_attendant
 123456.451 duration:76 real:76
 139651.526 entered-auto_attendant
 139651.526 duration:62 real:62`
 139382.537 entered-auto_attendant 

Using a bash shell script based in Linux, I'd like to delete all the rows based on the value of column 1 (The one with the long number). Having into consideration that this number is a variable number

I've tried with

awk '{a[$3]++}!(a[$3]-1)' file

sort -u | uniq

But I am not getting the result which would be something like this, making a comparison between all the values of the first column, delete all the duplicates and show it

 123456.451 entered-auto_attendant
 139651.526 entered-auto_attendant
 139382.537 entered-auto_attendant 
like image 599
user3494949 Avatar asked Apr 03 '14 21:04

user3494949


2 Answers

you didn't give an expected output, does this work for you?

 awk '!a[$1]++' file

with your data, the output is:

123456.451 entered-auto_attendant
139651.526 entered-auto_attendant
139382.537 entered-auto_attendant

and this line prints only unique column1 line:

 awk '{a[$1]++;b[$1]=$0}END{for(x in a)if(a[x]==1)print b[x]}' file

output:

139382.537 entered-auto_attendant
like image 58
Kent Avatar answered Oct 21 '22 16:10

Kent


uniq, by default, compares the entire line. Since your lines are not identical, they are not removed.

You can use sort to conveniently sort by the first field and also delete duplicates of it:

sort -t ' ' -k 1,1 -u file
  • -t ' ' fields are separated by spaces
  • -k 1,1: only look at the first field
  • -u: delete duplicates

Additionally, you might have seen the awk '!a[$0]++' trick for deduplicating lines. You can make this dedupe on the first column only using awk '!a[$1]++'.

like image 22
that other guy Avatar answered Oct 21 '22 15:10

that other guy