Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to quickly delete the lines in a file that contain items from a list in another file in BASH?

I have a file called words.txt containing a list of words. I also have a file called file.txt containing a sentence per line. I need to quickly delete any lines in file.txt that contain one of the lines from words.txt, but only if the match is found somewhere between { and }.

E.g. file.txt:

Once upon a time there was a cat.
{The cat} lived in the forest.
The {cat really liked to} eat mice.

E.g. words.txt:

cat
mice

Example output:

Once upon a time there was a cat.

Is removed because "cat" is found on those two lines and the words are also between { and }.

The following script successfully does this task:

while read -r line
do
    sed -i "/{.*$line.*}/d" file.txt
done < words.txt

This script is very slow. Sometimes words.txt contains several thousand items, so the while loop takes several minutes. I attempted to use the sed -f option, which seems to allow reading a file, but I cannot find any manuals explaining how to use this.

How can I improve the speed of the script?

like image 338
Village Avatar asked Dec 15 '22 22:12

Village


1 Answers

An awk solution:

awk 'NR==FNR{a["{[^{}]*"$0"[^{}]*}"]++;next}{for(i in a)if($0~i)next;b[j++]=$0}END{printf "">FILENAME;for(i=0;i in b;++i)print b[i]>FILENAME}' words.txt file.txt

It converts file.txt directly to have the expected output.

Once upon a time there was a cat.

Uncondensed version:

awk '
    NR == FNR {
        a["{[^{}]*" $0 "[^{}]*}"]++
        next
    }
    {
        for (i in a)
            if ($0 ~ i)
                next
        b[j++] = $0
    }
    END {
        printf "" > FILENAME
        for (i = 0; i in b; ++i)
            print b[i] > FILENAME
    }
' words.txt file.txt

If files are expected to get too large that awk may not be able to handle it, we can only redirect it to stdout. We may not be able to modify the file directly:

awk '
    NR == FNR {
        a["{[^{}]*" $0 "[^{}]*}"]++
        next
    }
    {
        for (i in a)
            if ($0 ~ i)
                next
    }
    1
' words.txt file.txt
like image 78
konsolebox Avatar answered May 07 '23 00:05

konsolebox