I had a nasty CRLF / LF conflict in git file that was probably committed from Windows machine. Is there a cross-platform way (preferably in Python) to detect what type of newlines is dominant through the file?
I've got this code (based on idea from https://stackoverflow.com/a/10562258/239247):
import sys
if not sys.argv[1:]:
sys.exit('usage: %s <filename>' % sys.argv[0])
with open(sys.argv[1],"rb") as f:
d = f.read()
crlf, lfcr = d.count('\r\n'), d.count('\n\r')
cr, lf = d.count('\r'), d.count('\n')
print('crlf: %s' % crlf)
print('lfcr: %s' % lfcr)
print('cr: %s' % cr)
print('lf: %s' % lf)
print('\ncr-crlf-lfcr: %s' % (cr - crlf - lfcr))
print('lf-crlf-lfcr: %s' % (lf - crlf - lfcr))
print('\ntotal (lf+cr-2*crlf-2*lfcr): %s\n' % (lf + cr - 2*crlf - 2*lfcr))
But it gives the stats wrong (for this file):
crlf: 1123
lfcr: 58
cr: 1123
lf: 1123
cr-crlf-lfcr: -58
lf-crlf-lfcr: -58
total (lf+cr-2*crlf-2*lfcr): -116
import sys
def calculate_line_endings(path):
# order matters!
endings = [
b'\r\n',
b'\n\r',
b'\n',
b'\r',
]
counts = dict.fromkeys(endings, 0)
with open(path, 'rb') as fp:
for line in fp:
for x in endings:
if line.endswith(x):
counts[x] += 1
break
print(counts)
if __name__ == '__main__':
if len(sys.argv) == 2:
calculate_line_endings(sys.argv[1])
sys.exit('usage: %s <filepath>' % sys.argv[0])
Gives output for your file
crlf: 1123
lfcr: 0
cr: 0
lf: 0
Is it enough?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With