Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to 'uniq' by column?

People also ask

How do you find duplicates in a column in Unix?

uniq command has an option "-d" which lists out only the duplicate records. sort command is used since the uniq command works only on sorted files. uniq command without the "-d" option will delete the duplicate records.

How do I sort a column in Linux?

Use the -k option to sort on a certain column. For example, use “-k 2” to sort on the second column.

Does uniq require sort?

The uniq command accepts input from a text-based file and removes any repeated lines, only if they are adjacent to each other. That's why it's used in conjunction with sort to remove non-adjacent lines. Case differences can be ignored when dropping duplicate adjacent lines, using the -i option.

What does uniq command do?

The uniq command can count and print the number of repeated lines. Just like duplicate lines, we can filter unique lines (non-duplicate lines) as well and can also ignore case sensitivity. We can skip fields and characters before comparing duplicate lines and also consider characters for filtering lines.


sort -u -t, -k1,1 file
  • -u for unique
  • -t, so comma is the delimiter
  • -k1,1 for the key field 1

Test result:

[email protected],2009-11-27 00:58:29.793000000,xx3.net,255.255.255.0 
[email protected],2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1 

awk -F"," '!_[$1]++' file
  • -F sets the field separator.
  • $1 is the first field.
  • _[val] looks up val in the hash _(a regular variable).
  • ++ increment, and return old value.
  • ! returns logical not.
  • there is an implicit print at the end.

To consider multiple column.

Sort and give unique list based on column 1 and column 3:

sort -u -t : -k 1,1 -k 3,3 test.txt
  • -t : colon is separator
  • -k 1,1 -k 3,3 based on column 1 and column 3

or if u want to use uniq:

<mycvs.cvs tr -s ',' ' ' | awk '{print $3" "$2" "$1}' | uniq -c -f2

gives:

1 01:05:47.893000000 2009-11-27 [email protected]
2 00:58:29.793000000 2009-11-27 [email protected]
1

If you want to retain the last one of the duplicates you could use

 tac a.csv | sort -u -t, -r -k1,1 |tac

Which was my requirement

here

tac will reverse the file line by line