Delete line from text file with line numbers from another file

Question

I have a text file containing a giant list of line numbers which I have to remove from another main file. Here's what my data looks like

lines.txt

and documents.txt

string1
string2
string3
...

If I had a short list of line numbers I could've easily used

sed -i '1d,4d,5d' documents.txt.

But there are lots of lots of line number that I have to delete. Also, I could use bash/perl script to store the line numbers in an array and echo the lines which are not in the array. But I was wondering if there is a built in command to do just that.

Any help would be highly appreciated.

Kent · Accepted Answer

awk oneliner should work for you, see test below:

kent$  head lines.txt doc.txt 
==> lines.txt <==
1
3
5
7

==> doc.txt <==
a
b
c
d
e
f
g
h

kent$  awk 'NR==FNR{l[$0];next;} !(FNR in l)' lines.txt doc.txt
b
d
f
h

as Levon suggested, I add some explanation:

awk                     # the awk command
 'NR==FNR{l[$0];next;}  # process the first file(lines.txt),save each line(the line# you want to delete) into an array "l"

 !(FNR in l)'           #now come to the 2nd file(doc.txt), if line number not in "l",print the line out
 lines.txt              # 1st argument, file:lines.txt
 docs.txt               # 2nd argument, file:doc.txt

Wes Miller · Answer

Well, I speak no Perl and bash I develop painful trial after trial after trial. However, Rexx would do this easily;

lines_to_delete = ""

do while lines( "lines.txt" )
   lines_to_delete = lines_to_delete linein( "lines.txt" )
end

n = 0
do while lines( "documents.txt" )
   line = linein( "documents.txt" )
   n = n + 1
   if ( wordpos( n, lines_to_delete ) == 0 )
      call lineout "temp_out,txt", line
end

This will leave your output in temp_out.txt which you may rename to documents.txt as desired.

Dennis Williamson · Answer

Here's a way to do it with sed:

sed ':a;${s/
//g;s/^/sed \o47/;s/$/d\o47 documents.txt/;b};s/$/d\;/;N;ba' lines.txt | sh

It uses sed to build a sed command and pipes it to the shell to be executed. The resulting sed command simply looks like `sed '3d;5d;11d' documents.txt.

To build it the outer sed command adds a d; after each number, loops to the next line, branching back to the beginning (N; ba). When the last line is reached ($), all the newlines are removed, sed ' is prepended and the final d and ' documents.txt are appended. Then b branches out of the :a - ba loop to the end since no label is specified.

Here's how you can do it using join and cat -n (assuming that lines.txt is sorted):

join -t $'\v' -v 2 -o 2.2 lines.txt <(cat -n documents.txt | sed 's/^ *//;s/	/\v/')

If lines.txt isn't sorted:

join -t $'\v' -v 2 -o 2.2 <(sort lines.txt) <(cat -n documents.txt | sed '^s/ *//;s/	/\v/')

Edit:

Fixed a bug in the join commands in which the original versions only output the first word of each line in documents.txt.

Delete line from text file with line numbers from another file

Tags:

string

linux

sed

text-files

awk

javaCity

3 Answers

Kent

Wes Miller

Dennis Williamson

Recent Activity

Donate For Us

Delete line from text file with line numbers from another file

Tags:

string

linux

sed

text-files

awk

javaCity

3 Answers

Kent

Wes Miller

Dennis Williamson

Related questions

Recent Activity

Donate For Us