I am writing a script to manipulate a text file.
First thing I want to do is check if duplicate entries exist and if so, ask the user whether we wants to keep or remove them.
I know how to display duplicate lines if they exist, but what I want to learn is just to get a yes/no answer to the question "Do duplicates exist?"
It seems uniq will return 0 either if duplicates were found or not as long as the command completed without issues.
What is that command that I can put in an if-statement just to tell me if duplicate lines exist?
My file is very simple, it is just values in single column.
I'd probably use awk to do this but, for the sake of variety, here is a brief pipe to accomplish the same thing:
$ { sort | uniq -d | grep . -qc; } < noduplicates.txt; echo $?
1
$ { sort | uniq -d | grep . -qc; } < duplicates.txt; echo $?
0
sort + uniq -d make sure that only duplicate lines (which don't have to be adjacent) get printed to stdout and grep . -c counts those lines emulating wc -l with the useful side effect that it returns 1 if it doesn't match (i.e. a zero count) and -q just silents the output so it doesn't print the line count so you can use it silently in your script.
has_duplicates()
{
{
sort | uniq -d | grep . -qc
} < "$1"
}
if has_duplicates myfile.txt; then
echo "myfile.txt has duplicate lines"
else
echo "myfile.txt has no duplicate lines"
fi
You can use awk combined with the boolean || operator:
# Ask question if awk found a duplicate
awk 'a[$0]++{exit 1}' test.txt || (
echo -n "remove duplicates? [y/n] "
read answer
# Remove duplicates if answer was "y" . I'm using `[` the shorthand
# of the test command. Check `help [`
[ "$answer" == "y" ] && uniq test.txt > test.uniq.txt
)
The block after the || will only get executed if the awk command returns 1, meaning it found duplicates.
However, for a basic understanding I'll also show an example using an if block
awk 'a[$0]++{exit 1}' test.txt
# $? contains the return value of the last command
if [ $? != 0 ] ; then
echo -n "remove duplicates? [y/n] "
read answer
# check answer
if [ "$answer" == "y" ] ; then
uniq test.txt > test.uniq.txt
fi
fi
However the [] are not just brackets like in other programming languages. [ is a synonym for the test bash builtin command and ] it's last argument. You need to read help [ in order to understand
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With