I have a text file containing a giant list of line numbers which I have to remove from another main file. Here's what my data looks like
lines.txt
1
2
4
5
22
36
400
...
and documents.txt
string1
string2
string3
...
If I had a short list of line numbers I could've easily used
sed -i '1d,4d,5d' documents.txt
.
But there are lots of lots of line number that I have to delete. Also, I could use bash/perl script to store the line numbers in an array and echo the lines which are not in the array. But I was wondering if there is a built in command to do just that.
Any help would be highly appreciated.
awk oneliner should work for you, see test below:
kent$ head lines.txt doc.txt
==> lines.txt <==
1
3
5
7
==> doc.txt <==
a
b
c
d
e
f
g
h
kent$ awk 'NR==FNR{l[$0];next;} !(FNR in l)' lines.txt doc.txt
b
d
f
h
as Levon suggested, I add some explanation:
awk # the awk command
'NR==FNR{l[$0];next;} # process the first file(lines.txt),save each line(the line# you want to delete) into an array "l"
!(FNR in l)' #now come to the 2nd file(doc.txt), if line number not in "l",print the line out
lines.txt # 1st argument, file:lines.txt
docs.txt # 2nd argument, file:doc.txt
Well, I speak no Perl and bash I develop painful trial after trial after trial. However, Rexx would do this easily;
lines_to_delete = ""
do while lines( "lines.txt" )
lines_to_delete = lines_to_delete linein( "lines.txt" )
end
n = 0
do while lines( "documents.txt" )
line = linein( "documents.txt" )
n = n + 1
if ( wordpos( n, lines_to_delete ) == 0 )
call lineout "temp_out,txt", line
end
This will leave your output in temp_out.txt which you may rename to documents.txt as desired.
Here's a way to do it with sed
:
sed ':a;${s/\n//g;s/^/sed \o47/;s/$/d\o47 documents.txt/;b};s/$/d\;/;N;ba' lines.txt | sh
It uses sed
to build a sed
command and pipes it to the shell to be executed. The resulting sed
command simply looks like `sed '3d;5d;11d' documents.txt.
To build it the outer sed
command adds a d;
after each number, loops to the next line, branching back to the beginning (N; ba
). When the last line is reached ($
), all the newlines are removed, sed '
is prepended and the final d
and ' documents.txt
are appended. Then b
branches out of the :a
- ba
loop to the end since no label is specified.
Here's how you can do it using join
and cat -n
(assuming that lines.txt is sorted):
join -t $'\v' -v 2 -o 2.2 lines.txt <(cat -n documents.txt | sed 's/^ *//;s/\t/\v/')
If lines.txt isn't sorted:
join -t $'\v' -v 2 -o 2.2 <(sort lines.txt) <(cat -n documents.txt | sed '^s/ *//;s/\t/\v/')
Edit:
Fixed a bug in the join
commands in which the original versions only output the first word of each line in documents.txt.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With