How can I compute the intersection between two text files in terms of raw text? It doesn't matter whether the solution uses a shell command or is expressed in Python, Elisp, or other common scripting languages.
I know comm
and grep -Fxv -f file1 file2
. Both assume that I am interested in the intersection of lines, whereas I am interested in the intersection of characters (with a minimum on the number of characters necessary to count as a match).
Bonus points for efficiency.
Example
If file 1 contains
foo bar baz-fee
and file 2 contains
fee foo bar-faa
then I would like to see
foo bar
fee
assuming a minimum match length of 3.
You're looking for Python's difflib
module (in the standard library), and in particular difflib.SequenceMatcher
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With