Good afternoon, I'm trying to make a bash script that cleans out some data output files. The files look like this: <pre class="prettyprint"><code>/path/ /path/to /path/to/keep /another/ /another/path/ /another/path/to /another/path/to/keep </code></pre> I'd like to end up with this: <pre class="prettyprint"><code>/path/to/keep /another/path/to/keep </code></pre> I want to cycle through lines of the file, checking the next line to see if it contains the current line, and if so, delete the current line from the file. Here's my code: <pre class="prettyprint"><code>for LINE in $(cat bbutters_data2.txt) do grep -A1 ${LINE} bbutters_data2.txt if [ $? -eq 0 ] then sed -i '/${LINE}/d' ./bbutters_data2.txt fi done </code></pre>

Assuming that your input file is sorted in the way that you have shown: <pre class="prettyprint"><code>$ awk 'NR>1 && substr($0,1,length(last))!=last {print last;} {last=$0;} END{print last}' file /path/to/keep /another/path/to/keep </code></pre> <h3>How it works</h3> awk reads through the input file line by line. Every time we read a new line, we compare it to the last. If the new line does not contain the last line, then we print the last line. In more detail: <ul> <li> <code>NR>1 && substr($0,1,length(last))!=last {print last;}</code> If this is not the first line and if the last line, called <code>last</code>, is not contained in the current line, <code>$0</code>, then print the last line. </li> <li> <code>last=$0</code> Update the variable <code>last</code> to the current line. </li> <li> <code>END{print last}</code> After we finish reading the file, print the last line. </li> </ul>

Bash script to remove redundant lines

Tags:

bash

Good afternoon,

I'm trying to make a bash script that cleans out some data output files. The files look like this:

/path/
/path/to
/path/to/keep
/another/
/another/path/
/another/path/to
/another/path/to/keep

I'd like to end up with this:

/path/to/keep
/another/path/to/keep

I want to cycle through lines of the file, checking the next line to see if it contains the current line, and if so, delete the current line from the file. Here's my code:

for LINE in $(cat bbutters_data2.txt)
do
    grep -A1 ${LINE} bbutters_data2.txt
    if [ $? -eq 0 ]
    then
       sed -i '/${LINE}/d' ./bbutters_data2.txt
    fi
done

544

asked May 08 '15 17:05

tcarter_compete

1 Answers

Assuming that your input file is sorted in the way that you have shown:

$ awk 'NR>1 && substr($0,1,length(last))!=last {print last;} {last=$0;} END{print last}' file
/path/to/keep
/another/path/to/keep

How it works

awk reads through the input file line by line. Every time we read a new line, we compare it to the last. If the new line does not contain the last line, then we print the last line. In more detail:

NR>1 && substr($0,1,length(last))!=last {print last;}

If this is not the first line and if the last line, called last, is not contained in the current line, $0, then print the last line.
last=$0

Update the variable last to the current line.
END{print last}

After we finish reading the file, print the last line.

115

answered Nov 12 '22 07:11

John1024

Related questions
                            
                                Incorporating bash scripts into an R package?
                            
                                Executing bash with subprocess.Popen
                            
                                Is it possible to accept user input as part of a remote git post-receive hook?
                            
                                Letting other users stop/restart simple bash daemons – use signals or what?
                            
                                linux pipe with multiple programs asking for user input
                            
                                Logging to a non blocking named pipe?
                            
                                monit removes quotes from start program commands
                            
                                Problems compiling ffmpeg on windows using cygwin
                            
                                combine history across tty
                            
                                svn status | sort - does not sort the output
                            
                                OpenCV Python Linker Error
                            
                                Find common lines between two files and also their line number
                            
                                Command for beautiful quoting
                            
                                Trapping ctrl+c in a bash script after a read in silent mode
                            
                                Efficiently computing floating-point arithmetic hundreds of thousands of times in Bash
                            
                                ssh does not reliably return output from remote command
                            
                                /usr/bin/env: ln: Too many levels of symbolic links
                            
                                How to set the default value of a variable as an array?
                            
                                Delete last executed command in Linux terminal
                            
                                Complete command options with equal signs

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With