Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bash script to remove redundant lines

Tags:

bash

Good afternoon,

I'm trying to make a bash script that cleans out some data output files. The files look like this:

/path/
/path/to
/path/to/keep
/another/
/another/path/
/another/path/to
/another/path/to/keep

I'd like to end up with this:

/path/to/keep
/another/path/to/keep

I want to cycle through lines of the file, checking the next line to see if it contains the current line, and if so, delete the current line from the file. Here's my code:

for LINE in $(cat bbutters_data2.txt)
do
    grep -A1 ${LINE} bbutters_data2.txt
    if [ $? -eq 0 ]
    then
       sed -i '/${LINE}/d' ./bbutters_data2.txt
    fi
done
like image 544
tcarter_compete Avatar asked May 08 '15 17:05

tcarter_compete


People also ask

How do I remove duplicate lines in Linux?

To remove duplicate lines from a sorted file and make it unique, we use the uniq command in the Linux system. The uniq command work as a kind of filter program that reports out the duplicate lines in a file. It filters adjacent matching lines from the input and gives a unique output.

How do I remove duplicates in Unix shell script?

Remove duplicate lines with uniq If you don't need to preserve the order of the lines in the file, using the sort and uniq commands will do what you need in a very straightforward way. The sort command sorts the lines in alphanumeric order. The uniq command ensures that sequential identical lines are reduced to one.


1 Answers

Assuming that your input file is sorted in the way that you have shown:

$ awk 'NR>1 && substr($0,1,length(last))!=last {print last;} {last=$0;} END{print last}' file
/path/to/keep
/another/path/to/keep

How it works

awk reads through the input file line by line. Every time we read a new line, we compare it to the last. If the new line does not contain the last line, then we print the last line. In more detail:

  • NR>1 && substr($0,1,length(last))!=last {print last;}

    If this is not the first line and if the last line, called last, is not contained in the current line, $0, then print the last line.

  • last=$0

    Update the variable last to the current line.

  • END{print last}

    After we finish reading the file, print the last line.

like image 115
John1024 Avatar answered Nov 12 '22 07:11

John1024