I need to check if one file is inside another file by bash script. For a given multiline pattern and input file.
Return value:
I want to receive status (how in grep command) 0 if any matches were found, 1 if no matches were found.
Pattern:
Explanation
Only the following examples should found matches:
pattern file1 file2 file3 file4
222 111 111 222 222
333 222 222 333 333
333 333 444
444
the following should't:
pattern file1 file2 file3 file4 file5 file6 file7
222 111 111 333 *222 111 111 222
333 *222 222 222 *333 222 222
333 333* 444 111 333
444 333 333
Here's my script:
#!/bin/bash
function writeToFile {
if [ -w "$1" ] ; then
echo "$2" >> "$1"
else
echo -e "$2" | sudo tee -a "$1" > /dev/null
fi
}
function writeOnceToFile {
pcregrep --color -M "$2" "$1"
#echo $?
if [ $? -eq 0 ]; then
echo This file contains text that was added previously
else
writeToFile "$1" "$2"
fi
}
file=file.txt
#1?1
#2?2
#3?3
#4?4
pattern=`cat pattern.txt`
#2?2
#3?3
writeOnceToFile "$file" "$pattern"
I can use grep command for all lines of pattern, but it fails with this example:
file.txt
#1?1
#2?2
#=== added line
#3?3
#4?4
pattern.txt
#2?2
#3?3
or even if you change lines: 2 with 3
file=file.txt
#1?1
#3?3
#2?2
#4?4
returning 0 when it should't.
How do I can fix it? Note that I prefer to use native installed programs (if this can be without pcregrep). Maybe sed or awk can resolve this problem?
I went through the problem again and I think awk
can handle this better:
awk 'FNR==NR {a[FNR]=$0; next}
FNR==1 && NR>1 {for (i in a) len++}
{for (i=last; i<=len; i++) {
if (a[i]==$0)
{last=i; next}
} status=1}
END {print status+0}' file pattern
The idea is:
- Read all the file file
in memory in an array a[line_number] = line
.
- Count the elements in the array.
- Loop through the file pattern
and check if the current line occurs in file
anytime between where the cursor is and the end of the file file
. If it matches, move the cursor to the position where it was found. If it did not, set the status to 1
- that is, there is a line in pattern
that did not occur in file
after the previous match.
- Print the status, that will be 0
unless it was set to 1
anytime before.
They do match:
$ tail f p
==> f <==
222
333
555
==> p <==
222
333
$ awk 'FNR==NR {a[FNR]=$0; next} FNR==1 && NR>1{for (i in a) len++} {for (i=last; i<=len; i++) {if (a[i]==$0) {last=i; next}} status=1} END {print status+0}' f p
0
They don't:
$ tail f p
==> f <==
333
222
555
==> p <==
222
333
$ awk 'FNR==NR {a[FNR]=$0; next} FNR==1 && NR>1{for (i in a) len++} {for (i=last; i<=len; i++) {if (a[i]==$0) {last=i; next}} status=1} END {print status+0}' f p
1
With seq
:
$ awk 'FNR==NR {a[FNR]=$0; next} FNR==1 && NR>1{for (i in a) len++} {for (i=last; i<=len; i++) {if (a[i]==$0) {last=i; next}} status=1} END {print status+0}' <(seq 2 20) <(seq 10)
1
$ awk 'FNR==NR {a[FNR]=$0; next} FNR==1 && NR>1{for (i in a) len++} {for (i=last; i<=len; i++) {if (a[i]==$0) {last=i; next}} status=1} END {print status+0}' <(seq 20) <(seq 10)
0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With