Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python : compare two files with different line endings

I have two files. File test.a and test.b. test.a was pre-generated on unix machine. test.b is generated by user and can be generated both on windows and unix machines.

I can't use filecmp.cmp('test01/test.a', 'test01/test.b') because it'll always return false, all thanks to different line endings.

Is there any elegant solution to this? If not, what would be the best way to change line endings of unix file before comparing it?

Thanks!

like image 806
Jinx Avatar asked Apr 12 '14 21:04

Jinx


People also ask

How do you do a diff in Python?

The Python standard library has a module specifically for the purpose of finding diffs between strings/files. To get a diff using the difflib library, you can simply call the united_diff function on it.


2 Answers

Assuming the two are text files, using standard open() and readline() functions should work, because unless b is passed, they operate with universal newlines (converting to \n):

def cmp_lines(path_1, path_2):
    l1 = l2 = True
    with open(path_1, 'r') as f1, open(path_2, 'r') as f2:
        while l1 and l2:
            l1 = f1.readline()
            l2 = f2.readline()
            if l1 != l2:
                return False
    return True

That will compare the files line-by-line, and return False as soon as two non-matching lines are found (also closing the file, due to the with block). If all the lines match, it returns True. All newlines are automatically converted to \n. Note that readline() returns '' when EOF (End Of File) is reached.

like image 127
Pi Marillion Avatar answered Oct 20 '22 01:10

Pi Marillion


What if you found what newline character the first line of one file used, and then depending on what that was, choose to replace all instances of that char with whatever the other file uses so you could use cmp, or not if they are already the same. I know you said you are dealing with large files, so perhaps this wouldn't suit at all.

However, look here regarding the detection of the newline character used in a file How can I detect DOS line breaks in a file?

and here regarding efficiency in a search and replace on a large string Fastest Python method for search and replace on a large string

hope this helps, apologies if not

like image 21
Totem Avatar answered Oct 20 '22 01:10

Totem