I have a utility script in Python:
#!/usr/bin/env python import sys unique_lines = [] duplicate_lines = [] for line in sys.stdin: if line in unique_lines: duplicate_lines.append(line) else: unique_lines.append(line) sys.stdout.write(line) # optionally do something with duplicate_lines This simple functionality (uniq without needing to sort first, stable ordering) must be available as a simple UNIX utility, mustn't it? Maybe a combination of filters in a pipe?
Reason for asking: needing this functionality on a system on which I cannot execute Python from anywhere.
In Excel, there are several ways to filter for unique values—or remove duplicate values: To filter for unique values, click Data > Sort & Filter > Advanced. To remove duplicate values, click Data > Data Tools > Remove Duplicates.
Select the range you want to remove duplicate rows. If you want to delete all duplicate rows in the worksheet, just hold down Ctrl + A key to select the entire sheet. 2. On Data tab, click Remove Duplicates in the Data Tools group.
The UNIX Bash Scripting blog suggests:
awk '!x[$0]++' This command is telling awk which lines to print. The variable $0 holds the entire contents of a line and square brackets are array access. So, for each line of the file, the node of the array x is incremented and the line printed if the content of that node was not (!) previously set.
A late answer - I just ran into a duplicate of this - but perhaps worth adding...
The principle behind @1_CR's answer can be written more concisely, using cat -n instead of awk to add line numbers:
cat -n file_name | sort -uk2 | sort -n | cut -f2- cat -n to prepend line numberssort -u remove duplicate data (-k2 says 'start at field 2 for sort key')sort -n to sort by prepended numbercut to remove the line numbering (-f2- says 'select field 2 till end')If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With