How to check if one file is part of other?

Question

I need to check if one file is inside another file by bash script. For a given multiline pattern and input file.

Return value:

I want to receive status (how in grep command) 0 if any matches were found, 1 if no matches were found.

Pattern:

multiline,
order of lines is important (treated as a single block of lines),
includes characters such as numbers, letters, ?, &, *, # etc.,

Explanation

Only the following examples should found matches:

pattern     file1 file2 file3 file4
222         111   111   222   222
333         222   222   333   333
            333   333         444
            444

the following should't:

pattern     file1 file2 file3 file4 file5 file6 file7
222         111   111   333   *222  111   111   222
333         *222  222   222   *333  222   222   
            333   333*        444   111         333
            444                     333   333

Here's my script:

#!/bin/bash

function writeToFile {
    if [ -w "$1" ] ; then
        echo "$2" >> "$1"
    else
        echo -e "$2" | sudo tee -a "$1" > /dev/null
    fi
}

function writeOnceToFile {
        pcregrep --color -M "$2" "$1"
        #echo $?

        if [ $? -eq 0 ]; then
            echo This file contains text that was added previously
        else
            writeToFile "$1" "$2"
        fi
}

file=file.txt 
#1?1
#2?2
#3?3
#4?4

pattern=`cat pattern.txt`
#2?2
#3?3

writeOnceToFile "$file" "$pattern"

I can use grep command for all lines of pattern, but it fails with this example:

file.txt 
#1?1
#2?2
#=== added line
#3?3
#4?4

pattern.txt
#2?2
#3?3

or even if you change lines: 2 with 3

file=file.txt 
#1?1
#3?3
#2?2
#4?4

returning 0 when it should't.

How do I can fix it? Note that I prefer to use native installed programs (if this can be without pcregrep). Maybe sed or awk can resolve this problem?

fedorqui 'SO stop harming' · Accepted Answer

I went through the problem again and I think awk can handle this better:

awk 'FNR==NR {a[FNR]=$0; next}
     FNR==1 && NR>1 {for (i in a) len++}
     {for (i=last; i<=len; i++) {
         if (a[i]==$0) 
            {last=i; next}
     } status=1}
     END {print status+0}' file pattern

The idea is: - Read all the file file in memory in an array a[line_number] = line. - Count the elements in the array. - Loop through the file pattern and check if the current line occurs in file anytime between where the cursor is and the end of the file file. If it matches, move the cursor to the position where it was found. If it did not, set the status to 1 - that is, there is a line in pattern that did not occur in file after the previous match. - Print the status, that will be 0 unless it was set to 1 anytime before.

Test

They do match:

$ tail f p
==> f <==
222
333
555

==> p <==
222
333
$ awk 'FNR==NR {a[FNR]=$0; next} FNR==1 && NR>1{for (i in a) len++} {for (i=last; i<=len; i++) {if (a[i]==$0) {last=i; next}} status=1} END {print status+0}' f p
0

They don't:

$ tail f p
==> f <==
333
222
555

==> p <==
222
333
$ awk 'FNR==NR {a[FNR]=$0; next} FNR==1 && NR>1{for (i in a) len++} {for (i=last; i<=len; i++) {if (a[i]==$0) {last=i; next}} status=1} END {print status+0}' f p
1

With seq:

$ awk 'FNR==NR {a[FNR]=$0; next} FNR==1 && NR>1{for (i in a) len++} {for (i=last; i<=len; i++) {if (a[i]==$0) {last=i; next}} status=1} END {print status+0}' <(seq 2 20) <(seq 10)
1
$ awk 'FNR==NR {a[FNR]=$0; next} FNR==1 && NR>1{for (i in a) len++} {for (i=last; i<=len; i++) {if (a[i]==$0) {last=i; next}} status=1} END {print status+0}' <(seq 20) <(seq 10)
0

How to check if one file is part of other?

Tags:

linux

bash

command-line

pcregrep

abrzozowski

1 Answers

Test

fedorqui 'SO stop harming'

Recent Activity

Donate For Us

How to check if one file is part of other?

Tags:

linux

bash

command-line

pcregrep

abrzozowski

1 Answers

Test

fedorqui 'SO stop harming'

Related questions

Recent Activity

Donate For Us