How can I compute the intersection between two text files in terms of raw text? It doesn't matter whether the solution uses a shell command or is expressed in Python, Elisp, or other common scripting languages.
I know comm and grep -Fxv -f file1 file2. Both assume that I am interested in the intersection of lines, whereas I am interested in the intersection of characters (with a minimum on the number of characters necessary to count as a match).
Bonus points for efficiency.
Example
If file 1 contains
foo bar baz-fee
and file 2 contains
fee foo bar-faa
then I would like to see
foo barfeeassuming a minimum match length of 3.
You're looking for Python's difflib module (in the standard library), and in particular difflib.SequenceMatcher.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With