Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find a windows end of line (EOL) character

I have several hundred GB of data that I need to paste together using the unix paste utility in Cygwin, but it won't work properly if there are windows EOL characters in the files. The data may or may not have windows EOL characters, and I don't want to spend the time running dos2unix if I don't have to.

So my question is, in Cygwin, how can I figure out whether these files have windows EOL CRLF characters?

I've tried creating some test data and running

sed -r 's/\r\n//' testdata.txt

But that appears to match regardless of whether dos2unix has been run or not.

Thanks.

like image 969
Stephen Turner Avatar asked Mar 17 '11 23:03

Stephen Turner


People also ask

How can I see end of line characters in Windows?

use a text editor like notepad++ that can help you with understanding the line ends. It will show you the line end formats used as either Unix(LF) or Macintosh(CR) or Windows(CR LF) on the task bar of the tool. you can also go to View->Show Symbol->Show End Of Line to display the line ends as LF/ CR LF/CR.

How do I view a CRLF in a text file?

In Notepad++ go to the View > Show Symbol menu and select Show End of Line. Once you select View > Show Symbol > Show End of Line you can see the CR LF characters visually.

How do I find the end of a line character in Unix?

DOS uses carriage return and line feed ("\r\n") as a line ending, which Unix uses just line feed ("\n").

Is Windows CRLF or LF?

Whereas Windows follows the original convention of a carriage return plus a line feed ( CRLF ) for line endings, operating systems like Linux and Mac use only the line feed ( LF ) character. The history of these two control characters dates back to the era of the typewriter.


1 Answers

The file(1) utility knows the difference:

$ file * | grep ASCII
2:                                       ASCII text
3:                                       ASCII English text
a:                                       ASCII C program text
blah:                                    ASCII Java program text
foo.js:                                  ASCII C++ program text
openssh_5.5p1-4ubuntu5.dsc:              ASCII text, with very long lines
windows:                                 ASCII text, with CRLF line terminators

file(1) has been optimized to try to read as little of a file as possible, so you may be lucky and drastically reduce the amount of disk IO you need to perform when finding and fixing the CRLF terminators.

Note that some cases of CRLF should stay in place: captures of SMTP will use CRLF. But that's up to you. :)

like image 119
sarnold Avatar answered Sep 30 '22 17:09

sarnold