I have a utility script in Python: <pre class="prettyprint"><code>#!/usr/bin/env python import sys unique_lines = [] duplicate_lines = [] for line in sys.stdin: if line in unique_lines: duplicate_lines.append(line) else: unique_lines.append(line) sys.stdout.write(line) # optionally do something with duplicate_lines </code></pre> This simple functionality (<code>uniq</code> without needing to sort first, stable ordering) must be available as a simple UNIX utility, mustn't it? Maybe a combination of filters in a pipe? Reason for asking: needing this functionality on a system on which I cannot execute Python from anywhere.

A late answer - I just ran into a duplicate of this - but perhaps worth adding... The principle behind @1_CR's answer can be written more concisely, using <code>cat -n</code> instead of <code>awk</code> to add line numbers: <pre class="prettyprint"><code>cat -n file_name | sort -uk2 | sort -n | cut -f2- </code></pre> <ul> <li>Use <code>cat -n</code> to prepend line numbers</li> <li>Use <code>sort -u</code> remove duplicate data (<code>-k2</code> says 'start at field 2 for sort key')</li> <li>Use <code>sort -n</code> to sort by prepended number</li> <li>Use <code>cut</code> to remove the line numbering (<code>-f2-</code> says 'select field 2 till end')</li> </ul>

Remove duplicate lines without sorting [duplicate]

Tags:

shell

scripting

filter

uniq

I have a utility script in Python:

#!/usr/bin/env python import sys unique_lines = [] duplicate_lines = [] for line in sys.stdin:   if line in unique_lines:     duplicate_lines.append(line)   else:     unique_lines.append(line)     sys.stdout.write(line) # optionally do something with duplicate_lines

This simple functionality (uniq without needing to sort first, stable ordering) must be available as a simple UNIX utility, mustn't it? Maybe a combination of filters in a pipe?

Reason for asking: needing this functionality on a system on which I cannot execute Python from anywhere.

482

asked Jul 17 '12 23:07

Robottinosino

2 Answers

The UNIX Bash Scripting blog suggests:

awk '!x[$0]++'

This command is telling awk which lines to print. The variable $0 holds the entire contents of a line and square brackets are array access. So, for each line of the file, the node of the array x is incremented and the line printed if the content of that node was not (!) previously set.

161

answered Oct 14 '22 14:10

Michael Hoffman

A late answer - I just ran into a duplicate of this - but perhaps worth adding...

The principle behind @1_CR's answer can be written more concisely, using cat -n instead of awk to add line numbers:

cat -n file_name | sort -uk2 | sort -n | cut -f2-

Use cat -n to prepend line numbers
Use sort -u remove duplicate data (-k2 says 'start at field 2 for sort key')
Use sort -n to sort by prepended number
Use cut to remove the line numbering (-f2- says 'select field 2 till end')

answered Oct 14 '22 14:10

Digital Trauma

Related questions
                            
                                Associative arrays in Shell scripts
                            
                                sed whole word search and replace
                            
                                Getting pids from ps -ef |grep keyword
                            
                                What's the difference between .bashrc, .bash_profile, and .environment?
                            
                                What is the exact meaning of IFS=$'\n'?
                            
                                Display current date and time without punctuation
                            
                                ZSH alias with parameter
                            
                                Windows batch: sleep [duplicate]
                            
                                How do I activate a virtualenv inside PyCharm's terminal?
                            
                                How to base64 encode image in linux bash / shell
                            
                                Portable way to get file size (in bytes) in shell?
                            
                                Checking from shell script if a directory contains files
                            
                                Run a JAR file from the command line and specify classpath
                            
                                How do you append to an already existing string?
                            
                                How do I add tab completion to the Python shell?
                            
                                How to add lines to end of file on Linux
                            
                                While loop stops reading after the first line in Bash
                            
                                Passing argument to alias in bash [duplicate]
                            
                                How to escape os.system() calls?
                            
                                How to Batch Rename Files in a macOS Terminal?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With