Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert CRLF to LF on a Windows machine in Python

So I got those template, they are all ending in LF and I can fill some terms inside with format and still get LF files by opening with "wb".

Those templates are used in a deployment script on a windows machine to deploy on a unix server.

Problem is, a lot of people are going to mess with those template, and I'm 100% sure that some of them will put some CRLF inside.

How could I, using Python, convert all the CRLF to LF?

like image 878
Heetola Avatar asked Apr 05 '16 09:04

Heetola


People also ask

How do you convert CRLF to LF in Python?

Newline code \n (LF), \r\n (CR + LF) Inserting a newline code \n , \r\n into a string will result in a line break at that location. On Unix, including Mac, \n (LF) is often used, and on Windows, \r\n (CR + LF) is often used as a newline code. Some text editors allow you to select a newline code.

Is Crlf Windows or Unix?

Unix systems use a single character -- the linefeed -- and Windows systems use both a carriage return and a linefeed (often referred to as "CRLF").


2 Answers

Convert line endings in-place (with Python 3)

Line endings:

  • Windows - \r\n, called CRLF
  • Linux/Unix/MacOS - \n, called LF

Windows to Linux/Unix/MacOS (CRLFLF)

Here is a short Python script for directly converting Windows line endings to Linux/Unix/MacOS line endings. The script works in-place, i.e., without creating an extra output file.

# replacement strings WINDOWS_LINE_ENDING = b'\r\n' UNIX_LINE_ENDING = b'\n'  # relative or absolute file path, e.g.: file_path = r"c:\Users\Username\Desktop\file.txt"  with open(file_path, 'rb') as open_file:     content = open_file.read()      # Windows ➡ Unix content = content.replace(WINDOWS_LINE_ENDING, UNIX_LINE_ENDING)  # Unix ➡ Windows # content = content.replace(UNIX_LINE_ENDING, WINDOWS_LINE_ENDING)  with open(file_path, 'wb') as open_file:     open_file.write(content) 

Linux/Unix/MacOS to Windows (LFCRLF)

To change the converting from Linux/Unix/MacOS to Windows, simply comment the replacement for Unix ➡ Windows back in (remove the # in front of the line).

DO NOT comment out the command for the Windows ➡ Unix replacement, as it ensures a correct conversion. When converting from LR to CRLF, it is important that there are no CRLF line endings already present in the file. Otherwise, those lines would be converted to CRCRLF. Converting lines from CRLF to LF first and then doing the aspired conversion from LF to CRLF will avoid this issue (thanks @neuralmer for pointing that out).


Code Explanation

Binary Mode

Important: We need to make sure that we open the file both times in binary mode (mode='rb' and mode='wb') for the conversion to work.

When opening files in text mode (mode='r' or mode='w' without b), the platform's native line endings (\r\n on Windows and \r on old Mac OS versions) are automatically converted to Python's Unix-style line endings: \n. So the call to content.replace() couldn't find any \r\n line endings to replace.

In binary mode, no such conversion is done. Therefore the call to str.replace() can do its work.

Binary Strings

In Python 3, if not declared otherwise, strings are stored as Unicode (UTF-8). But we open our files in binary mode - therefore we need to add b in front of our replacement strings to tell Python to handle those strings as binary, too.

Raw Strings

On Windows the path separator is a backslash \ which we would need to escape in a normal Python string with \\. By adding r in front of the string we create a so called "raw string" which doesn't need any escaping. So you can directly copy/paste the path from Windows Explorer into your script.

(Hint: Inside Windows Explorer press CTRL+L to automatically select the path from the address bar.)

Alternative solution

We open the file twice to avoid the need of repositioning the file pointer. We could also have opened the file once with mode='rb+' but then we would have needed to move the pointer back to start after reading its content (open_file.seek(0)) and truncate its original content before writing the new one (open_file.truncate(0)).

Simply opening the file again in write mode does that automatically for us.

Cheers and happy programming,
winklerrr

like image 172
winklerrr Avatar answered Sep 21 '22 13:09

winklerrr


Python 3:

The default newline type for open is universal, in which case it doesn't mind which sort of newline each line has. You can also request a specific form of newline with the newline argument for open.

Translating from one form to the other is thus rather simple in Python:

with open('filename.in', 'r') as infile, \      open('filename.out', 'w', newline='\n') as outfile:     outfile.writelines(infile.readlines()) 

Python 2:

The open function supports universal newlines via the 'rU' mode.

Again, translating from one form to the other:

with open('filename.in', 'rU') as infile, \      open('filename.out', 'w', newline='\n') as outfile:     outfile.writelines(infile.readlines()) 

(In Python 3, mode U is actually deprecated; the equivalent form is newline=None, which is the default)

like image 38
Yann Vernier Avatar answered Sep 23 '22 13:09

Yann Vernier