Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to 'uniq' by column?

People also ask

How do you find duplicates in a column in Unix?

uniq command has an option "-d" which lists out only the duplicate records. sort command is used since the uniq command works only on sorted files. uniq command without the "-d" option will delete the duplicate records.

How do I sort a column in Linux?

Use the -k option to sort on a certain column. For example, use “-k 2” to sort on the second column.

Does uniq require sort?

The uniq command accepts input from a text-based file and removes any repeated lines, only if they are adjacent to each other. That's why it's used in conjunction with sort to remove non-adjacent lines. Case differences can be ignored when dropping duplicate adjacent lines, using the -i option.

What does uniq command do?

The uniq command can count and print the number of repeated lines. Just like duplicate lines, we can filter unique lines (non-duplicate lines) as well and can also ignore case sensitivity. We can skip fields and characters before comparing duplicate lines and also consider characters for filtering lines.


sort -u -t, -k1,1 file
  • -u for unique
  • -t, so comma is the delimiter
  • -k1,1 for the key field 1

Test result:

[email protected],2009-11-27 00:58:29.793000000,xx3.net,255.255.255.0 
[email protected],2009-11-27 01:05:47.893000000,xx2.net,127.0.0.1 

awk -F"," '!_[$1]++' file
  • -F sets the field separator.
  • $1 is the first field.
  • _[val] looks up val in the hash _(a regular variable).
  • ++ increment, and return old value.
  • ! returns logical not.
  • there is an implicit print at the end.

To consider multiple column.

Sort and give unique list based on column 1 and column 3:

sort -u -t : -k 1,1 -k 3,3 test.txt
  • -t : colon is separator
  • -k 1,1 -k 3,3 based on column 1 and column 3

or if u want to use uniq:

<mycvs.cvs tr -s ',' ' ' | awk '{print $3" "$2" "$1}' | uniq -c -f2

gives:

1 01:05:47.893000000 2009-11-27 [email protected]
2 00:58:29.793000000 2009-11-27 [email protected]
1

If you want to retain the last one of the duplicates you could use

 tac a.csv | sort -u -t, -r -k1,1 |tac

Which was my requirement

here

tac will reverse the file line by line


Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!