I have a utility script in Python:
#!/usr/bin/env python import sys unique_lines = [] duplicate_lines = [] for line in sys.stdin: if line in unique_lines: duplicate_lines.append(line) else: unique_lines.append(line) sys.stdout.write(line) # optionally do something with duplicate_lines
This simple functionality (uniq
without needing to sort first, stable ordering) must be available as a simple UNIX utility, mustn't it? Maybe a combination of filters in a pipe?
Reason for asking: needing this functionality on a system on which I cannot execute Python from anywhere.
In Excel, there are several ways to filter for unique values—or remove duplicate values: To filter for unique values, click Data > Sort & Filter > Advanced. To remove duplicate values, click Data > Data Tools > Remove Duplicates.
Select the range you want to remove duplicate rows. If you want to delete all duplicate rows in the worksheet, just hold down Ctrl + A key to select the entire sheet. 2. On Data tab, click Remove Duplicates in the Data Tools group.
The UNIX Bash Scripting blog suggests:
awk '!x[$0]++'
This command is telling awk which lines to print. The variable $0
holds the entire contents of a line and square brackets are array access. So, for each line of the file, the node of the array x
is incremented and the line printed if the content of that node was not (!
) previously set.
A late answer - I just ran into a duplicate of this - but perhaps worth adding...
The principle behind @1_CR's answer can be written more concisely, using cat -n
instead of awk
to add line numbers:
cat -n file_name | sort -uk2 | sort -n | cut -f2-
cat -n
to prepend line numberssort -u
remove duplicate data (-k2
says 'start at field 2 for sort key')sort -n
to sort by prepended numbercut
to remove the line numbering (-f2-
says 'select field 2 till end')If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With