Good afternoon,
I'm trying to make a bash script that cleans out some data output files. The files look like this:
/path/
/path/to
/path/to/keep
/another/
/another/path/
/another/path/to
/another/path/to/keep
I'd like to end up with this:
/path/to/keep
/another/path/to/keep
I want to cycle through lines of the file, checking the next line to see if it contains the current line, and if so, delete the current line from the file. Here's my code:
for LINE in $(cat bbutters_data2.txt)
do
grep -A1 ${LINE} bbutters_data2.txt
if [ $? -eq 0 ]
then
sed -i '/${LINE}/d' ./bbutters_data2.txt
fi
done
To remove duplicate lines from a sorted file and make it unique, we use the uniq command in the Linux system. The uniq command work as a kind of filter program that reports out the duplicate lines in a file. It filters adjacent matching lines from the input and gives a unique output.
Remove duplicate lines with uniq If you don't need to preserve the order of the lines in the file, using the sort and uniq commands will do what you need in a very straightforward way. The sort command sorts the lines in alphanumeric order. The uniq command ensures that sequential identical lines are reduced to one.
Assuming that your input file is sorted in the way that you have shown:
$ awk 'NR>1 && substr($0,1,length(last))!=last {print last;} {last=$0;} END{print last}' file
/path/to/keep
/another/path/to/keep
awk reads through the input file line by line. Every time we read a new line, we compare it to the last. If the new line does not contain the last line, then we print the last line. In more detail:
NR>1 && substr($0,1,length(last))!=last {print last;}
If this is not the first line and if the last line, called last
, is not contained in the current line, $0
, then print the last line.
last=$0
Update the variable last
to the current line.
END{print last}
After we finish reading the file, print the last line.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With