Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Intersection between text files

How can I compute the intersection between two text files in terms of raw text? It doesn't matter whether the solution uses a shell command or is expressed in Python, Elisp, or other common scripting languages.

I know comm and grep -Fxv -f file1 file2. Both assume that I am interested in the intersection of lines, whereas I am interested in the intersection of characters (with a minimum on the number of characters necessary to count as a match).

Bonus points for efficiency.

Example

If file 1 contains

foo bar baz-fee

and file 2 contains

fee foo bar-faa

then I would like to see

  • foo bar
  • fee

assuming a minimum match length of 3.

like image 244
ahmex Avatar asked Dec 27 '22 18:12

ahmex


1 Answers

You're looking for Python's difflib module (in the standard library), and in particular difflib.SequenceMatcher.

like image 121
Eli Bendersky Avatar answered Jan 07 '23 14:01

Eli Bendersky