Here's what I want to do:
Search a document for a pattern containing RegEx, then check if this exact pattern is present twice inside of a line.
Content of file.xml:
(some code) "testen" (more code) >testete<
(some code) "bleiben" (more code) >bleiben<
(some code) "stehen" (more code) >stand<
(some code) "hängen" (more code) >hängten<
...
Now I want to check for .*en
and check if the (exact) same word occurs twice in the line. So the outcome should be:
bleiben
Because Testen != testete, stehen != stand, hängen != hängten
Is there a way to do this?
You can handle this search on the first grep
line by using the pattern: .*en.*en
:
grep .*en.*en your_file
This will output only the lines that have en
appearing twice in them.
If you need to handle it in two back-to-back grep
's, you could still use this same command in a piped version:
grep .*en your_file | grep .*en.*en
Also, if you ever want to increase the number of instances in the same line, you can take advantage of grep
's -P
option and use a Perl regex:
grep -P "(.*en){2}" your_file
With this, you can just change the {2}
to however-many instances you want it to appear in a single line and it should work.
EDIT (to find lines with exact same word twice)
This is difficult without an extended pattern that can define the boundaries of a word - and your example output doesn't really help much. To go for a straight-to-the-point example, we can just assume a "word" is any alphabetical string a-z
that's ending with en
. You can customize this boundary as needed:
grep -P "([a-z]+en).*\1" your_file
This will print any line that has a word ending in en
that is found elsewhere in the line (the \1
).
One caveat to mention, which relates to the word-boundary issue noted above. In the context of "bleiben" and "bleiben", they are equal. However, in the context of "ben" and "bleiben", this pattern will also match because it will see then ending "ben" from "bleiben" as the matching pattern (thereby using "ben" = "ben"). If this is not acceptable, you will have to establish a more-strict word-boundary (i.e. - don't allow special characters?).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With