Detecting type of line breaks

Question

What would be the most efficient (fast and reliable enough) way in JavaScript to determine the type of line breaks used in a text - Unix vs Windows.

In my Node app I have to read in large utf-8 text files and then process them based on whether they use Unix or Windows line breaks.

When the type of line breaks comes up as uncertain, I want to conclude based on which one it is most likely then.

UPDATE

As per my own answer below, the code I ended up using.

Mir-Ismaili · Accepted Answer

Thank @Sam-Graham. I tried to produce an optimized way. Also, the output of the function is directly usable (see below example):

function getLineBreakChar(string) {
    const indexOfLF = string.indexOf('
', 1)  // No need to check first-character
    
    if (indexOfLF === -1) {
        if (string.indexOf('
') !== -1) return '
'
        
        return '
'
    }
    
    if (string[indexOfLF - 1] === '
') return '
'
    
    return '
'
}

^{Note1: Supposed string is healthy (only contains one type of line-breaks).}

^{Note2: Supposed you want LF to be default encoding (when no line-break found).}

Usage example:

fs.writeFileSync(filePath,
        string.substring(0, a) +
        getLineBreakChar(string) +
        string.substring(b)
);

This utility may be useful too:

const getLineBreakName = (lineBreakChar) =>
    lineBreakChar === '
' ? 'LF' : lineBreakChar === '
' ? 'CR' : 'CRLF'

Sam-Graham · Answer

You would want to look first for an LF. like source.indexOf(' ') and then see if the character behind it is a CR like source[source.indexOf(' ')-1] === ' '. This way, you just find the first example of a newline and match to it. In summary,

function whichLineEnding(source) {
     var temp = source.indexOf('
');
     if (source[temp - 1] === '
')
         return 'CRLF'
     return 'LF'
}

There are two popularish examples of libraries doing this in the npm modules: node-newline and crlf-helper The first does a split on the entire string which is very inefficient in your case. The second uses a regex which in your case would not be quick enough.

However, from your edit, if you want to determine which is more plentiful. Then I would use the code from node-newline as it does handle that case.

Detecting type of line breaks

Tags:

javascript

node.js

vitaly-t

2 Answers

Mir-Ismaili

Sam-Graham

Recent Activity

Donate For Us

Detecting type of line breaks

Tags:

javascript

node.js

vitaly-t

2 Answers

Mir-Ismaili

Sam-Graham

Related questions

Recent Activity

Donate For Us