Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get newline stats for a text file in Python

Tags:

python

newline

I had a nasty CRLF / LF conflict in git file that was probably committed from Windows machine. Is there a cross-platform way (preferably in Python) to detect what type of newlines is dominant through the file?

I've got this code (based on idea from https://stackoverflow.com/a/10562258/239247):

import sys
if not sys.argv[1:]:
  sys.exit('usage: %s <filename>' % sys.argv[0])

with open(sys.argv[1],"rb") as f:
  d = f.read()
  crlf, lfcr = d.count('\r\n'), d.count('\n\r')
  cr, lf = d.count('\r'), d.count('\n')
  print('crlf: %s' % crlf)
  print('lfcr: %s' % lfcr)
  print('cr: %s' % cr)
  print('lf: %s' % lf)
  print('\ncr-crlf-lfcr: %s' % (cr - crlf - lfcr))
  print('lf-crlf-lfcr: %s' % (lf - crlf - lfcr))
  print('\ntotal (lf+cr-2*crlf-2*lfcr): %s\n' % (lf + cr - 2*crlf - 2*lfcr))

But it gives the stats wrong (for this file):

crlf: 1123
lfcr: 58
cr: 1123
lf: 1123

cr-crlf-lfcr: -58
lf-crlf-lfcr: -58

total (lf+cr-2*crlf-2*lfcr): -116
like image 417
anatoly techtonik Avatar asked Dec 14 '22 15:12

anatoly techtonik


1 Answers

import sys


def calculate_line_endings(path):
    # order matters!
    endings = [
        b'\r\n',
        b'\n\r',
        b'\n',
        b'\r',
    ]
    counts = dict.fromkeys(endings, 0)

    with open(path, 'rb') as fp:
        for line in fp:
            for x in endings:
                if line.endswith(x):
                    counts[x] += 1
                    break
    print(counts)


if __name__ == '__main__':
    if len(sys.argv) == 2:
        calculate_line_endings(sys.argv[1])

    sys.exit('usage: %s <filepath>' % sys.argv[0])

Gives output for your file

crlf: 1123
lfcr: 0
cr: 0
lf: 0

Is it enough?

like image 59
sorrat Avatar answered Jan 03 '23 05:01

sorrat