Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete line from text file with line numbers from another file

I have a text file containing a giant list of line numbers which I have to remove from another main file. Here's what my data looks like

lines.txt

1
2
4
5
22
36
400
...

and documents.txt

string1
string2
string3
...

If I had a short list of line numbers I could've easily used

sed -i '1d,4d,5d' documents.txt.

But there are lots of lots of line number that I have to delete. Also, I could use bash/perl script to store the line numbers in an array and echo the lines which are not in the array. But I was wondering if there is a built in command to do just that.

Any help would be highly appreciated.

like image 276
javaCity Avatar asked Jul 06 '12 21:07

javaCity


3 Answers

awk oneliner should work for you, see test below:

kent$  head lines.txt doc.txt 
==> lines.txt <==
1
3
5
7

==> doc.txt <==
a
b
c
d
e
f
g
h

kent$  awk 'NR==FNR{l[$0];next;} !(FNR in l)' lines.txt doc.txt
b
d
f
h

as Levon suggested, I add some explanation:

awk                     # the awk command
 'NR==FNR{l[$0];next;}  # process the first file(lines.txt),save each line(the line# you want to delete) into an array "l"

 !(FNR in l)'           #now come to the 2nd file(doc.txt), if line number not in "l",print the line out
 lines.txt              # 1st argument, file:lines.txt
 docs.txt               # 2nd argument, file:doc.txt
like image 161
Kent Avatar answered Nov 15 '22 06:11

Kent


Well, I speak no Perl and bash I develop painful trial after trial after trial. However, Rexx would do this easily;

lines_to_delete = ""

do while lines( "lines.txt" )
   lines_to_delete = lines_to_delete linein( "lines.txt" )
end

n = 0
do while lines( "documents.txt" )
   line = linein( "documents.txt" )
   n = n + 1
   if ( wordpos( n, lines_to_delete ) == 0 )
      call lineout "temp_out,txt", line
end

This will leave your output in temp_out.txt which you may rename to documents.txt as desired.

like image 39
Wes Miller Avatar answered Nov 15 '22 07:11

Wes Miller


Here's a way to do it with sed:

sed ':a;${s/\n//g;s/^/sed \o47/;s/$/d\o47 documents.txt/;b};s/$/d\;/;N;ba' lines.txt | sh

It uses sed to build a sed command and pipes it to the shell to be executed. The resulting sed command simply looks like `sed '3d;5d;11d' documents.txt.

To build it the outer sed command adds a d; after each number, loops to the next line, branching back to the beginning (N; ba). When the last line is reached ($), all the newlines are removed, sed ' is prepended and the final d and ' documents.txt are appended. Then b branches out of the :a - ba loop to the end since no label is specified.

Here's how you can do it using join and cat -n (assuming that lines.txt is sorted):

join -t $'\v' -v 2 -o 2.2 lines.txt <(cat -n documents.txt | sed 's/^ *//;s/\t/\v/')

If lines.txt isn't sorted:

join -t $'\v' -v 2 -o 2.2 <(sort lines.txt) <(cat -n documents.txt | sed '^s/ *//;s/\t/\v/')

Edit:

Fixed a bug in the join commands in which the original versions only output the first word of each line in documents.txt.

like image 26
Dennis Williamson Avatar answered Nov 15 '22 07:11

Dennis Williamson