Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BASH - Tell if duplicate lines exist (y/n)

Tags:

file

bash

uniq

I am writing a script to manipulate a text file.

First thing I want to do is check if duplicate entries exist and if so, ask the user whether we wants to keep or remove them.

I know how to display duplicate lines if they exist, but what I want to learn is just to get a yes/no answer to the question "Do duplicates exist?"

It seems uniq will return 0 either if duplicates were found or not as long as the command completed without issues.

What is that command that I can put in an if-statement just to tell me if duplicate lines exist?

My file is very simple, it is just values in single column.

like image 283
DMS Avatar asked May 14 '26 22:05

DMS


2 Answers

I'd probably use awk to do this but, for the sake of variety, here is a brief pipe to accomplish the same thing:

$ { sort | uniq -d | grep . -qc; } < noduplicates.txt; echo $?
1
$ { sort | uniq -d | grep . -qc; } < duplicates.txt; echo $?
0

sort + uniq -d make sure that only duplicate lines (which don't have to be adjacent) get printed to stdout and grep . -c counts those lines emulating wc -l with the useful side effect that it returns 1 if it doesn't match (i.e. a zero count) and -q just silents the output so it doesn't print the line count so you can use it silently in your script.

has_duplicates()
{
  {
    sort | uniq -d | grep . -qc
  } < "$1"
}

if has_duplicates myfile.txt; then
  echo "myfile.txt has duplicate lines"
else
  echo "myfile.txt has no duplicate lines"
fi
like image 174
Adrian Frühwirth Avatar answered May 16 '26 12:05

Adrian Frühwirth


You can use awk combined with the boolean || operator:

# Ask question if awk found a duplicate
awk 'a[$0]++{exit 1}' test.txt || (
    echo -n "remove duplicates? [y/n] "
    read answer
    # Remove duplicates if answer was "y" . I'm using `[` the shorthand
    # of the test command. Check `help [`
    [ "$answer" == "y" ] && uniq test.txt > test.uniq.txt
)

The block after the || will only get executed if the awk command returns 1, meaning it found duplicates.

However, for a basic understanding I'll also show an example using an if block

awk 'a[$0]++{exit 1}' test.txt

# $? contains the return value of the last command
if [ $? != 0 ] ; then
    echo -n "remove duplicates? [y/n] "
    read answer
    # check answer
    if [ "$answer" == "y" ] ; then
        uniq test.txt > test.uniq.txt            
    fi
fi

However the [] are not just brackets like in other programming languages. [ is a synonym for the test bash builtin command and ] it's last argument. You need to read help [ in order to understand

like image 20
hek2mgl Avatar answered May 16 '26 12:05

hek2mgl