Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing two .txt files using difflib in Python

Tags:

python

difflib

I am trying to compare two text files and output the first string in the comparison file that does not match but am having difficulty since I am very new to python. Can anybody please give me a sample way to use this module.

When I try something like:

result = difflib.SequenceMatcher(None, testFile, comparisonFile)

I get an error saying object of type 'file' has no len.

like image 266
101010110101 Avatar asked Jun 10 '09 18:06

101010110101


People also ask

What is Difflib in Python?

Source code: Lib/difflib.py. This module provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce information about file differences in various formats, including HTML and context and unified diffs.

How do you print the difference between two files in Python?

The Python standard library has a module specifically for the purpose of finding diffs between strings/files. To get a diff using the difflib library, you can simply call the united_diff function on it.

How to get differences between two files in Python using difflib?

Method 1: Using unified_diff () Python has a Module which is specially used for comparing the differences between the files. To get differences using the difflib library, we have to call the unified_diff () function to this comparison.

How to compare the differences between the files in Python?

Python has a Module which is specially used for comparing the differences between the files. To get differences using the difflib library, we have to call the unified_diff () function to this comparison. -simple. +easy. There is one Class available for comparing the differences between the files which named as Differ inside the difflib library.

How to compare texts between two texts in Python?

The difflib module is useful for comparing texts and finding the differences between them. This Python 3 module comes pre-packaged with the language. It contains many useful functions for comparing bodies of texts. Firstly, we’ll use the unified_diff() function to pinpoint mismatches between two data files.

How do I compare two sequences in difflib?

At the core of the difflib module is SequenceMatcher class which implements an algorithm responsible for comparing two sequences. It requires that all the elements of both sequences be hashable in order for them to work.


3 Answers

For starters, you need to pass strings to difflib.SequenceMatcher, not files:

# Like so difflib.SequenceMatcher(None, str1, str2)  # Or just read the files in difflib.SequenceMatcher(None, file1.read(), file2.read()) 

That'll fix your error anyway. To get the first non-matching string, I'll direct you to the wonderful world of difflib documentation.

like image 152
Kenan Banks Avatar answered Sep 22 '22 20:09

Kenan Banks


Here is a quick example of comparing the contents of two files using Python difflib...

import difflib  file1 = "myFile1.txt" file2 = "myFile2.txt"  diff = difflib.ndiff(open(file1).readlines(),open(file2).readlines()) print ''.join(diff), 
like image 20
Vyke Avatar answered Sep 21 '22 20:09

Vyke


Are you sure both files exist ?

Just tested it and i get a perfect result.

To get the results i use something like:

import difflib

diff=difflib.ndiff(open(testFile).readlines(), open(comparisonFile).readlines())

try:
    while 1:
        print diff.next(),
except:
    pass

the first character of each line indicates if they are different: eg.: '+' means the following line has been added, etc.

like image 44
RSabet Avatar answered Sep 22 '22 20:09

RSabet